JoVE Logo

Sign In

A subscription to JoVE is required to view this content. Sign in or start your free trial.

In This Article

  • Summary
  • Abstract
  • Introduction
  • Protocol
  • Results
  • Discussion
  • Disclosures
  • Acknowledgements
  • Materials
  • References
  • Reprints and Permissions

Summary

Here, we present a protocol to access and analyze many human and model organism databases efficiently. This protocol demonstrates the use of MARRVEL to analyze candidate disease-causing variants identified from next-generation sequencing efforts.

Abstract

Through whole-exome/genome sequencing, human geneticists identify rare variants that segregate with disease phenotypes. To assess if a specific variant is pathogenic, one must query many databases to determine whether the gene of interest is linked to a genetic disease, whether the specific variant has been reported before, and what functional data is available in model organism databases that may provide clues about the gene’s function in human. MARRVEL (Model organism Aggregated Resources for Rare Variant ExpLoration) is a one-stop data collection tool for human genes and variants and their orthologous genes in seven model organisms including in mouse, rat, zebrafish, fruit fly, nematode worm, fission yeast, and budding yeast. In this Protocol, we provide an overview of what MARRVEL can be used for and discuss how different datasets can be used to assess whether a variant of unknown significance (VUS) in a known disease-causing gene or a variant in a gene of uncertain significance (GUS) may be pathogenic. This protocol will guide a user through searching multiple human databases simultaneously starting with a human gene with or without a variant of interest. We also discuss how to utilize data from OMIM, ExAC/gnomAD, ClinVar, Geno2MP, DGV and DECHIPHER. Moreover, we illustrate how to interpret a list of ortholog candidate genes, expression patterns, and GO terms in model organisms associated with each human gene. Furthermore, we discuss the value protein structural domain annotations provided and explain how to use the multiple species protein alignment feature to assess whether a variant of interest affects an evolutionarily conserved domain or amino acid. Finally, we will discuss three different use-cases of this website. MARRVEL is an easily accessible open access website designed for both clinical and basic researchers and serves as a starting point to design experiments for functional studies.

Introduction

The use of next-generation sequencing technology is expanding in both research and clinical genetic laboratories1. Whole-exome (WES) and whole-genome sequencing (WGS) analyses reveal numerous rare variants of unknown significance (VUS) in known disease-causing genes as well as variants in genes that are yet to be associated with a Mendelian disease (GUS: genes of uncertain significance). Presented with a list of genes and variants in a clinical sequence report, medical geneticists must manually visit multiple online resources to obtain more information to assess which variant may be responsible for a certain phenotype seen in the patient of interest. This process is time-consuming, and its efficacy is highly dependent on the expertise of the individual. Although several guideline papers have been published2,3, interpretation of WES and WGS requires manual curation since there is yet to be a standardized methodology for variant analysis. For the interpretation of VUS, knowledge on the previously reported genotype-phenotype relationship, mode of inheritance, and allele frequencies in the general population become valuable. In addition, knowledge on whether the variant affects a critical protein domain, or an evolutionarily conserved residue may increase or decrease the likelihood of pathogenicity. To gather all of this information, one typically needs to navigate through 10-20 human and model organism databases since the information is scattered through the World Wide Web.

Similarly, model organism scientists who work on specific genes and pathways are often interested in connecting their findings to human disease mechanisms and wish to take advantage of the knowledge that is being generated in the human genomics field. However, due to the rapid expansion and evolution of data sets regarding the human genome, it has been challenging to identify databases that provide useful information. In addition, since most model organism databases are designed for researchers who work with the specific organism on a daily basis, it is very difficult, for example, for a mouse researcher to search for specific information in a Drosophila database and vice versa. Similar to the variant interpretation searches performed by medical geneticists, identifying useful human and other model organism information is time-consuming and heavily dependent on the background of the model organism researcher. MARRVEL (Model organism Aggregated Resources for Rare Variant ExpLoration)4 is a tool designed for both groups of users to streamline their workflow.

MARRVEL (http://marrvel.org) was designed as a centralized search engine that collects data systematically in an efficient and consistent manner for clinicians and researchers. With information from 20 or more publicly available databases, this program allows users to quickly gather information and access a large number of human and model organism databases without reiterative searches. The search result pages also contain hyperlinks to the original sources of information, allowing individuals to access the raw data and gather additional information provided by the sources.

In contrast to many of the variant prioritization tools that require large sequencing data input in the form of VCF or BAM files and installations of often proprietary/commercial software, MARRVEL operates on any web-browser. It can be used at no cost and compatible with portable devices (e.g. smartphones, tablets) as long as one is connected to the internet. We chose this format since many clinicians and researchers typically need to search one or a few genes and variants at a time. Note that we are developing batch-download and API (application programming interface) features for MARRVEL to eventually allow users to curate hundreds of genes and variants at a time through customized query tools if necessary.

Due to the wide range of applications, in this protocol, we will describe a broadly encompassing approach on how to navigate through different datasets that MARRVEL displays. More targeted examples that are tailored towards specific users’ needs will be described in Representative Results section. It is important to note that the output of MARRVEL still requires a certain level of background knowledge in either human genetics or model organisms to extract valuable information. We refer the readers to the table that lists primary papers that describe the function of each of the original databases that are curated by MARRVEL (Table 1). The following protocol is divided into three sections: (1) How to begin a search, (2) how to interpret MARRVEL human genetics outputs, and (3) how to make use of model organism data in MARRVEL. In the Representative Results section, more focused and specific approaches are described. MARRVEL is being actively updated so please refer to the current website’s FAQ page for details about data sources. We strongly recommend the users of MARRVEL to sign up in order to receive update notifications through the e-mail submission form at the bottom of the MARRVEL home page.

Protocol

1. How to begin a search

  1. For the human gene and variant-based search, go to steps 1.1.1.-1.1.2. For human gene-based search (no variant input), go to step 1.2. For model organism gene-based search, refer to steps 1.3.1.-1.3.2.
    1. Go to the home page of MARRVEL4 at http://marrvel.org/. Start by entering a human gene symbol. Ensure that the candidate gene names are listed below the input box with each character entry. If the search comes back negative, make sure the gene symbol used is up to date using the HUGO Gene Nomenclature Committee website5 (HGNC; https://www.genenames.org/).
    2. Enter a human variant. The search bar is compatible with two types of variant nomenclature: genome location similar to how variants are displayed on ExAC and GnomAD6 and transcript-based nomenclature according to HGVS guidelines. Examples of such formats are shown in grey text within the search box. For genomic location nomenclature, use the coordinates according to hg19/GRCh37. Proceed to step 2.
      NOTE: If a search returns an error, the most common problems are either the gene symbol is not up to date or the variant nomenclature is incorrect. In those cases, the HGNC (https://www.genenames.org/), Mutalyzer7 (https://www.mutalyzer.nl/), and TransVar8 (https://bioinformatics.mdanderson.org/transvar/) websites are great resources to correct the error. HGNC provides official gene symbols and their aliases for all human genes.     
    3. If still encountering error messages after confirming the gene name is up to date, use Mutalyzer and TransVar to check and convert variant nomenclature.
    4. In some situations, such as a very recent gene symbol change in HGNC, try using a synonym for the gene and please contact the MARRVEL operating team using the "Feedback" tab so to update the source data, as MARRVEL may not provide the correct information due a lag in data update.
  2. Enter a human gene symbol and leave the human variant search bar blank. If an error is encountered, go to HGNC (https://www.genenames.org/) to check for the official gene symbol or try an older gene symbol.
    1. Click on Model Organisms Search tab on the top banner (Figure 1) or go to http://marrvel.org/model. Select the model organism of choice and enter a model organism gene symbol. Click on the gene symbol as the name is autocompleted and then click Search. If the search result is negative, check the official gene symbol that is used in model organism databases (Table 1).
    2. If the search result is still negative, access DIOPT (DRSC Integrative Ortholog Prediction Tool, https://www.flyrnai.org/cgi-bin/DRSC_orthologs.pl) and HCOP (https://www.genenames.org/tools/hcop/) to assess if there are no good predicted orthologs for the gene of interest. DIOPT is an ortholog prediction search engine run by the DRSC (Drosophila RNAi Screening Center) and HCOP is a similar suite developed by HGNC.
      NOTE: Additional searches using BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi) may allow users to find orthologs that may be missed by prediction algorithms used in DIOPT and HCOP.
    3. Click on the MARRVEL it at the bottom for the predicted human ortholog of choice. Check the DIOPT score9 and Best score from Human gene to model organism? for the selection of the human gene. Proceed to step 2.
      NOTE: DIOPT score9 (https://www.flyrnai.org/cgi-bin/DRSC_orthologs.pl) is a value of how many ortholog prediction algorithms predict a pair of genes in two organisms to be orthologous to one another. For more information about these values and the specific algorithms used to calculate this score, refer to Hu et al9. When Best score from Human gene to model organism? is Yes, it indicates that the human gene is more likely a true human orthologs of the gene of interest but there could be exceptions, especially when multiple human genes are orthologous to multiple model organism genes due to gene duplication events during evolution. If the gene of interest is a member of a complex gene family that have undergone divergent evolution in multiple species, users should identify a publication that has performed an extensive phylogenetic analysis of the gene family of interest to identify the most likely ortholog candidate gene.

2. How to interpret MARRVEL human genetics outputs for a gene and variant search

NOTE: On the results page, there are seven human databases that are displayed (Table 1, Figure 1). For each output box, there is an External link button (small box with a diagonal arrow) on the upper right-hand corner that will link to the original database for more details.

  1. Click OMIM (Online Mendelian Inheritance in Man, https://www.omim.org/)10, the first database that is displayed.
    NOTE: OMIM is a manually curated database that aggregates and summarizes information on genetic diseases and traits in the human.
    1. Use the Human Gene Description box from OMIM for a short summary of what is known about the gene and gene product.
    2. Use the Gene-Phenotype Relationships box to determine if this gene is a known disease-causing gene or not. This box provides manually curated known disease or phenotype associations with the gene of interest.
    3. Use the Reported Alleles from OMIM box to get a list of pathogenic variants curated by OMIM.
      NOTE: Since manual curation of a publication regarding a new disease gene discovery is necessary for any gene-disease association to appear in OMIM, some time lag and/or missed publications may lead to misconception. It is recommended that users perform PubMed (https://www.ncbi.nlm.nih.gov/pubmed/) searches to look into recent literature as well (See 4.1.2.). For additional information curated in OMIM, refer to Amberger10,11.
  2. Click ExAC (Exome Aggregation Consortium, http://exac.broadinstitute.org/)6 and gnomAD (genome Aggregation Database, http://gnomad.broadinstitute.org/), large population genomics databases based on WES and WGS of people who are selected to exclude severe pediatric diseases.
    NOTE: ExAC contains ~60,000 WES whereas gnomAD contains ~120,000 WES and ~15,000 WGS. Both ExAC and gnomAD can be used as a control population database, especially for severe pediatric disorders, but its interpretation requires some degree of caution. In general, gnomAD can be considered as an updated and expanded version of ExAC since most cohorts that are included in ExAC is also included in gnomAD. However since there are some exceptions (see cohort information in http://exac.broadinstitute.org/about and http://gnomad.broadinstitute.org/about, respectively), MARRVEL displays data from both sources.
    1. Use the Control Population Gene Summary box to obtain gene-level statistics such as the probability of finding the loss of function (LOF) alleles in the general population. This is called the pLI (probability of LOF Intolerance) score in ExAC and can be used to infer how likely a single copy of a LOF allele for a specific gene may cause a dominant disease through haplo-insufficient mechanisms.
      NOTE: Looking at the pLI score of a gene has value, especially when dealing with dominant disorders that present as severe pediatric diseases associated with de novo variants. If a gene has a pLI score of 0.00, it means it is highly tolerant of LOF variants thus the gene unlikely cause disease via a dominant haploinsufficiency mechanism. This does not, however, necessarily rule out other dominant gain of function (GOF) or dominant negative mediated mechanisms may cause disease. In addition, genes that cause the recessive diseases may have low pLI scores since careers are expected to be found in the general population. On the other hand, if a gene has a pLI score of 1.00, it is possible that the loss of one copy of this gene is detrimental for human health. Additional searches in websites such as DOMINO (https://wwwfbm.unil.ch/domino/) may also be used in combination to assess the likelihood of a variant in a specific gene causing a dominant disorder.
    2. Use the next two boxes to obtain the allele frequencies of the variant of interest in ExAC and gnomAD, respectively to help interpret whether or not the variant may be pathogenic depending on if the patient has the dominant or recessive disease. This box will only be displayed when the user inputs variant information when initiating the search.
      NOTE: If one hypothesizes a recessive disease scenario and the pLI score of the gene of interest is low, one should pay attention to the allele frequency listed here. Some geneticists may establish a cut-off point of 0.005 to 0.0001 as the maximum allele frequency for pathogenic variants that can cause a severe recessively inherited disease2. On the other hand, if one hypothesizes a dominant disease scenario, it is less likely to find the identical or similar variant in a control population. Again, this requires caution because individuals with late-onset disorders, diseases with mild presentation, psychiatric disorders or diseases not screened by the ExAC/gnomAD researchers may be still included and the variant may still be a dominant pathogenic variant. Also, there have been some instances of variants linked to pediatric conditions found in a few individuals in these databases12,13,14, potentially due to incomplete penetrance or somatic mosaicism13,15,16. In addition, although ExAC and gnomAD will display variants that are found in a homozygous state, it will not indicate whether any of the variants are found in a compound heterozygous state. Finally, some variants found in these databases are tagged as low confidence due to technical challenges in sequencing (e.g. low sequence coverage, repetitive sequence). To look more carefully into these data sets, users are recommended to use the external link button to visit the original ExAC and gnomAD websites to gain additional information.
  3. Click Geno2MP (Genotype to Mendelian Phenotype Browser, http://geno2mp.gs.washington.edu/Geno2MP/), a collection of WES-based data from the University of Washington Center for Mendelian Genetics. It contains about 9,600 exomes (as of 1/18/2019) of affected individuals and unaffected relatives with some phenotypic descriptions (Figure 1).
    1. Use the Disease population box to obtain the allele frequency of the variant of interest in this cohort.
    2. Use the Gene-Phenotype Relationships box to obtain HPO (human phenotype ontology)17 terms for the individuals with the variant of interest. This is one of many ways for one to look for patients that may have the same disease.
      NOTE: If a gene of interest is suspected to be associated with a patient’s disease and there are matches found in Geno2MP, additional important information may be present in the data source beyond what is displayed.
      1. Click the external link button to the gene-specific page on Geno2MP, filter for mutations that are similar to those of the patient (e.g., missense, LOF), and carefully review the lists of variants. Take note of the variants with high CADD18 scores and click into the HPO profiles. For example, CADD scores higher than 20 are within the top 1% of all variants predicted to be deleterious, CADD scores that are higher than 10 are within the top 10%. HPO terms provide a standardized description of human phenotypes. Here, make sure to check if the variant was identified in an affected individual or in a relative.
      2. If variants are found in patients that are affected in the same organ system as the patient, consider using the e-mail form to contact the physician that submitted these cases to Geno2MP using the feature provided on the Geno2MP website.
        NOTE: Not all physicians respond to such queries, so one should explore other avenues of patient matchmaking. Other ways to gather a cohort of patients affected by the same diseases is to use tools such as GeneMatcher19 (https://www.genematcher.org/) and other databases that are part of the Matchmaker Exchange19,20 (https://www.matchmakerexchange.org/). See accompanying JoVE article for more information on matchmaking21.
  4. Use the ClinVar (https://www.ncbi.nlm.nih.gov/clinvar/)22 database, supported by the National Institutes of Health (NIH), where researchers and clinicians submit variants with or without determination of pathogenicity, for checking single nucleotide variants (SNV), small indels and larger copy number variations (CNV).
    1. Use the top row to review a summary of the number of each type of variants reported in ClinVar (Figure 1).
    2. Check the list of variants below in the box Reported Alleles from ClinVar.
      NOTE: If a variant was included in the initial search, the highlighted variants in teal are all variants that include the genomic location of the variant of interest [including large CNVs, which are often labeled as; genomic coordinate…x1 (deletion) and …x3 (duplication)].
  5. Use DGV23 (Database of Genomic Variants, http://dgv.tcag.ca/dgv/app/home) and DECIPHER24 (DatabasE of genomiC varIation and Phenotype in Humans using Ensembl Resources, https://decipher.sanger.ac.uk/), both collections of CNVs. DGV is the largest public-access collection of structural variants from more than 54,000 individuals. This database includes samples of reportedly healthy individuals, at the time of ascertainment, from up to 72 different studies. Similarly, the data displayed from DECIPHER includes common variants from the control population.
    NOTE: Since MARRVEL does not have permission to display patient derived data from DECIPHIER, users are encouraged to directly visit the DECIPHER website to access potentially pathogenic CNV information.
    1. Click the Copy Number Variation in Control Population (DGV Database) box to obtain variants that contain the gene of interest. Information such as the size, subtype, and reference of the copy number variation can be found in the same box.
    2. Click the Common Copy Number Variants (DECIPHER Database) box to obtain variants that contain the genomic location of the variant of interest. This information may help determine if the gene is duplicated or deleted in the control individuals.
      NOTE: If the gene of interest is deleted in many individuals in the control population, it means that this gene is likely to be highly tolerant of LOF variants. Like low pLI scores, this suggests that a single copy loss of this gene is less likely to cause a severe disease via a haploinsufficiency mechanism. This does not, however, necessarily rule out other dominant gain of function or dominant negative mechanisms (e.g. antimorphic, hypermorphic and neomorphic alleles) caused by specific missense and truncation alleles.  Possible limitations to these data include variation in source and method of the data acquired, lack of information regarding incomplete penetrance of pathogenic CNVs, and whether individuals developed certain diseases subsequent to data collection.

3. How to use model organism data in MARRVEL

  1. Use the Gene Function Table to obtain the following information for eight model organisms including human (human, rat, mouse, zebrafish, Drosophila, C elegans, budding yeast and fission yeast):
    1. Gene name: Since each gene name is hyperlinked to gene pages on respective model organism databases, click on these links to find out more about the phenotypic information and resources available for each model organism. For example on FlyBase25 (http://flybase.org/), there will be a list of all alleles that have been generated, their respective phenotypes and the availability of each allele from public stock centers.
    2. PubMed link: Click on the PubMed link to go to a list of publications that relates to the gene of interest in each organism. Without using these links, searching for the human gene directly in PubMed may lead to missing some publications that used an old gene alias to refer to the human gene. Similarly, model organism gene names may have fluctuated historically.
    3. DIOPT9 score: Check this column for a score of how many ortholog prediction algorithms predict the gene is likely to be an ortholog of the human gene of interest. One may use a DIOPT score of 3 or above as a reasonable cut-off to identify solid ortholog candidates. However, there are cases where genuine orthologs only have a DIOPT score of 1 due to limited homology. At the top of the gene function table, un-check the "Show only best DIOPT score gene" box to display all candidates that typically include homologous genes that are not necessarily orthologs.
    4. Expression: Check this column for the list of the tissues where the gene or protein of interest has been reported to be expressed in human or model organism databases. Human gene and protein expression data are from GTEx26 (https://gtexportal.org/) and Human Protein Atlas27 (https://www.proteinatlas.org/), respectively. Some have a button with pop-up links, such as for human and for fly that display the expression pattern using a heat map, whereas others are hyperlinked to respective model organism databases pages.
    5. Gene Ontology28 (GO) terms: Filter by experimental evidence codes and obtain from respective human or model organism databases. GO terms based on "computational analysis evidence codes" and "electronic annotation evidence codes" (predictions) are not displayed. Please visit each model organism website to gather this information if necessary.
    6. Other links such as Monarch Initiative29 (https://monarchinitiative.org/) and IMPC30 (http://www.mousephenotype.org/): Use the Monarch Initiative hyperlink to navigate to the Phenogrid page for the specific human gene, a chart that provides a quick comparison between the phenotypes associated with the gene of interest to known human diseases and model organism mutants that have phenotypic overlaps. If a mouse gene has a knockout mouse made or planned by the International Mouse Phenotyping Consortium (IMPC), the "IMPC" links to the page that details the phenotype of the knockout mouse and its availability from public stock centers.
  2. Human Protein Domains: Use the human gene protein domains box to obtain predicted protein domains of the human gene. The data are derived from DIOPT, which uses Pfam (https://pfam.xfam.org/) and CCD (Conserved Domains Database, https://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml). A single residue maybe annotated more than once due to some overlap in domains annotated in the two sources.
  3. Use the Multiple Protein Alignment box to obtain the amino acid multiple alignment generated by DIOPT9  which includes human (hs), rat (rn), mouse (mm), zebrafish (dr), fruit fly (dm), worm (ce), and yeasts (sc and sp). To highlight the amino acid of interest, scroll down to the bottom of the box and enter the amino acid numbers below and the amino acids of interest will be highlighted in teal. The alignment is provided by DIOPT and uses MAFFT aligner (Multiple alignment program for amino acid or nucleotide sequences, https://mafft.cbrc.jp/alignment/software/31).
    NOTE: If the amino acid that is highlighted based on the number is not the one expected, it may be due to different splicing isoforms used for the alignment. In principle, DIOPT uses the longest isoform to display in this box. Also, for segments of genes that are not well conserved, alignment of multi-species sequences using default parameters may not be optimal. We recommend using other websites and software like Clustal Omega and ClustalW/X (http://www.clustal.org/)32 to optimize the alignment parameters and matrices accordingly.

Results

Human geneticists and model organism scientists each use MARRVEL in distinct ways, each with different desired outcomes. Below are three vignettes of possible uses for MARRVEL.

Evaluating pathogenicity of a variant in a dominant disease
Most of the users that visit MARRVEL use this website to analyze the likelihood that a rare human variant may cause a certain disease. For example, a missense (17:59477596 G>A, p.R20Q) variant in TBX2 was found to segregate i...

Discussion

Critical steps in this protocol include the initial input (steps 1.1-1.3) and subsequent interpretation of the output. The most common reason why search results are negative is because of the many ways that a gene and/or variant can be described. While MARRVEL is updated on a scheduled basis, these updates may cause disconnects between the different databases that MARRVEL links to. Thus, the first step in troubleshooting is invariably checking to see if alternative names of the gene or variant will lead to a successful s...

Disclosures

The authors have nothing to disclose.

Acknowledgements

We thank Drs. Rami Al-Ouran, Seon-Young Kim, Yanhui (Claire) Hu, Ying-Wooi Wan, Naveen Manoharan, Sasidhar Pasupuleti, Aram Comjean, Dongxue Mao, Michael Wangler, Hsiao-Tuan Chao, Stephanie Mohr, and Norbert Perrimon for their support in the development and maintenance of MARRVEL. We are grateful to Samantha L. Deal and J. Michael Harnish for their input on this manuscript.  

The initial development of MARRVEL was supported in part by the Undiagnosed Diseases Network Model Organisms Screening Center through the NIH Commonfund (U54NS093793) and through the NIH Office of Research Infrastructure Programs (ORIP) (R24OD022005). JW is supported by the NIH Eunice Kennedy Shriver National Institute of Child Health & Human Development (F30HD094503) and The Robert and Janice McNair Foundation McNair MD/PhD Student Scholar Program at BCM. HJB is further supported by the NIH National Institute of General Medical Sciences (R01GM067858) and is an Investigator of the Howard Hughes Medical Institute. ZL is supported by the NIH National Institute of General Medical Science (R01GM120033), National Institute of Aging (R01AG057339), and the Huffington Foundation. SY received additional support from the NIH National Institute on Deafness and other Communication Disorders (R01DC014932), the Simons Foundation (SFARI Award: 368479), the Alzheimer’s Association (New Investigator Research Grant: 15-364099), Naman Family Fund for Basic Research and Caroline Wiess Law Fund for Research in Molecular Medicine. 

Materials

NameCompanyCatalog NumberComments
Human GeneticsClinVarPMID: 29165669https://www.ncbi.nlm.nih.gov/clinvar/
Human GeneticsDECIPHERPMID: 19344873 https://decipher.sanger.ac.uk/
Human GeneticsDGVPMID: 24174537http://dgv.tcag.ca/dgv/app/home
Orthology PredictionDIOPTPMID: 21880147 https://www.flyrnai.org/cgi-bin/DRSC_orthologs.pl
Human Gene/Transcript NomenclatureEnsemblPMID: 29155950 https://useast.ensembl.org/
Human GeneticsExAC PMID: 27535533http://exac.broadinstitute.org/
Primary Model Organism DatabasesFlyBase (Drosophila)PMID:26467478http://flybase.org
Model Organism Database Integration ToolsGene2FunctionPMID: 28663344http://www.gene2function.org/search/
Human GeneticsGeno2MPN/Ahttp://geno2mp.gs.washington.edu/Geno2MP/
Human GeneticsgnomADPMID: 27535533http://gnomad.broadinstitute.org/
Gene OntologyGO CentralPMID: 10802651, 25428369 http://www.geneontology.org/
Human Gene/Protein ExpressionGTExPMID: 29019975, 23715323 https://gtexportal.org/home/
Human Gene NomenclatureHGNCPMID: 27799471 https://www.genenames.org/
Primary Model Organism DatabasesIMPC (mouse)PMID: 27626380http://www.mousephenotype.org/
Primary Model Organism DatabasesMGI (mouse)PMID:25348401http://www.informatics.jax.org/
Model Organism Database Integration ToolsMonarch InitiativePMID: 27899636https://monarchinitiative.org/
Human Variant NomenclatureMutalyzerPMID: 18000842 https://mutalyzer.nl/
Human GeneticsOMIMPMID: 28654725https://omim.org/
Primary Model Organism DatabasesPomBase (fission yeast)PMID:22039153https://www.pombase.org/
LiteraturePubMedN/Ahttps://www.ncbi.nlm.nih.gov/pubmed/
Primary Model Organism DatabasesRGD (rat)PMID:25355511https://rgd.mcw.edu/
Primary Model Organism DatabasesSGD (budding yeast)PMID: 22110037https://www.yeastgenome.org/
Human Gene/Protein ExpressionThe Human Protein AtlasPMID: 21752111https://www.proteinatlas.org/
Primary Model Organism DatabasesWormBase (C. elegans)PMID:26578572http://wormbase.org
Primary Model Organism DatabasesZFIN (zebrafish)PMID:26097180https://zfin.org/

References

  1. Yang, Y., et al. Clinical whole-exome sequencing for the diagnosis of mendelian disorders. New England Journal of Medicine. 369 (16), 1502-1511 (2013).
  2. Richards, S., et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genetics in Medicine. 17 (5), 405-424 (2015).
  3. MacArthur, D. G., et al. Guidelines for investigating causality of sequence variants in human disease. Nature. 508 (7497), 469-476 (2014).
  4. Wang, J., et al. MARRVEL: Integration of Human and Model Organism Genetic Resources to Facilitate Functional Annotation of the Human Genome. American Journal of Human Genetics. 100 (6), 843-853 (2017).
  5. Povey, S., et al. The HUGO Gene Nomenclature Committee (HGNC). Human Genetics. 109 (6), 678-680 (2001).
  6. Lek, M., et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 536 (7616), 285-291 (2016).
  7. Wildeman, M., van Ophuizen, E., den Dunnen, J. T., Taschner, P. E. Improving sequence variant descriptions in mutation databases and literature using the Mutalyzer sequence variation nomenclature checker. Human Mutation. 29 (1), 6-13 (2008).
  8. Zhou, W., et al. TransVar: a multilevel variant annotator for precision genomics. Nature Methods. 12 (11), 1002-1003 (2015).
  9. Hu, Y., et al. An integrative approach to ortholog prediction for disease-focused and other functional studies. BMC Bioinformatics. 12, 357 (2011).
  10. Amberger, J. S., Hamosh, A. Searching Online Mendelian Inheritance in Man (OMIM): A Knowledgebase of Human Genes and Genetic Phenotypes. Current Protocols in Bioinformatics. 58, 1 (2017).
  11. Amberger, J. S., Bocchini, C. A., Scott, A. F., Hamosh, A. OMIM.org: leveraging knowledge across phenotype-gene relationships. Nucleic Acids Research. 47, 1038-1043 (2019).
  12. Liu, N., et al. Functional variants in TBX2 are associated with a syndromic cardiovascular and skeletal developmental disorder. Human Molecular Genetics. 27 (14), 2454-2465 (2018).
  13. Ropers, H. H., Wienker, T. Penetrance of pathogenic mutations in haploinsufficient genes for intellectual disability and related disorders. European Journal of Medical Genetics. 58 (12), 715-718 (2015).
  14. Shashi, V., et al. De Novo Truncating Variants in ASXL2 Are Associated with a Unique and Recognizable Clinical Phenotype. American Journal of Human Genetics. 100 (1), 179 (2017).
  15. Chen, R., et al. Analysis of 589,306 genomes identifies individuals resilient to severe Mendelian childhood diseases. Nature Biotechnology. 34 (5), 531-538 (2016).
  16. Halvorsen, M., et al. Mosaic mutations in early-onset genetic diseases. Genetics in Medicine. 18 (7), 746-749 (2016).
  17. Kohler, S., et al. The Human Phenotype Ontology in 2017. Nucleic Acids Research. 45 (1), 865-876 (2017).
  18. Rentzsch, P., Witten, D., Cooper, G. M., Shendure, J., Kircher, M. CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Research. 47 (1), 886-894 (2019).
  19. Sobreira, N., Schiettecatte, F., Valle, D., Hamosh, A. GeneMatcher: a matching tool for connecting investigators with an interest in the same gene. Human Mutation. 36 (10), 928-930 (2015).
  20. Sobreira, N. L. M., et al. Matchmaker Exchange. Current Protocols in Human Genetics. 95 (9), 31-39 (2017).
  21. Harnish, M., Deal, S., Wangler, M., Yamamoto, S. In vivo functional study of disease-associated rare human variants using Drosophila. Journal of Visualized Experiments. , (2019).
  22. Harrison, S. M., et al. Using ClinVar as a Resource to Support Variant Interpretation. Current Protocols in Human Genetics. 89, 11-18 (2016).
  23. MacDonald, J. R., Ziman, R., Yuen, R. K., Feuk, L., Scherer, S. W. The Database of Genomic Variants: a curated collection of structural variation in the human genome. Nucleic Acids Research. 42, 986-992 (2014).
  24. Firth, H. V., et al. DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources. American Journal of Human Genetics. 84 (4), 524-533 (2009).
  25. Thurmond, J., et al. FlyBase 2.0: the next generation. Nucleic Acids Research. 47, 759-765 (2019).
  26. Consortium, G. T. Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science. 348 (6235), 648-660 (2015).
  27. Ponten, F., Jirstrom, K., Uhlen, M. The Human Protein Atlas--a tool for pathology. Journal of Pathology. 216 (4), 387-393 (2008).
  28. The Gene Ontology, C. The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Research. , (2018).
  29. Mungall, C. J., et al. The Monarch Initiative: an integrative data and analytic platform connecting phenotypes to genotypes across species. Nucleic Acids Research. 45 (1), 712-722 (2017).
  30. Meehan, T. F., et al. Disease model discovery from 3,328 gene knockouts by The International Mouse Phenotyping Consortium. Nature Genetics. 49 (8), 1231-1238 (2017).
  31. Katoh, K., Rozewicki, J., Yamada, K. D. MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization. Brief Bioinform. , (2017).
  32. Sievers, F., Higgins, D. G. Clustal Omega for making accurate alignments of many protein sequences. Protein Science. 27 (1), 135-145 (2018).
  33. Yoon, W. H., et al. Loss of Nardilysin, a Mitochondrial Co-chaperone for alpha-Ketoglutarate Dehydrogenase, Promotes mTORC1 Activation and Neurodegeneration. Neuron. 93 (1), 115-131 (2017).
  34. Deal, S., Yamamoto, S. Unraveling novel mechanisms of neurodegeneration through a large-scale forward genetic screen in Drosophila. Frontiers in Genetics. 9, (2019).
  35. Matamoros, A. J., Baas, P. W. Microtubules in health and degenerative disease of the nervous system. Brain Research Bulletin. 126, 217-225 (2016).
  36. Theodosiou, A., Arhondakis, S., Baumann, M., Kossida, S. Evolutionary scenarios of Notch proteins. Molecular Biology and Evolution. 26 (7), 1631-1640 (2009).
  37. Shayevitz, C., Cohen, O. S., Faraone, S. V., Glatt, S. J. A re-review of the association between the NOTCH4 locus and schizophrenia. American Journal of Medical Genetics. Part B: Neuropsychiatric Genetics. 159 (5), 477-483 (2012).
  38. Wang, Z., et al. A review and re-evaluation of an association between the NOTCH4 locus and schizophrenia. American Journal of Medical Genetics. Part B: Neuropsychiatric Genetics. 141 (8), 902-906 (2006).
  39. Oriel, C., Lasko, P. Recent Developments in Using Drosophila as a Model for Human Genetic Disease. International Journal of Molecular Sciences. 19 (7), (2018).
  40. Hu, Y., Comjean, A., Mohr, S. E., FlyBase, C., Perrimon, N. Gene2Function: An Integrated Online Resource for Gene Function Discovery. G3. 7 (8), 2855-2858 (2017).

Reprints and Permissions

Request permission to reuse the text or figures of this JoVE article

Request Permission

Explore More Articles

MARRVELHuman GenomicsModel Organism GeneticsRare Genetic DiseasesDisease RelevanceBiomedical FieldsGene Variant SearchHGNC WebsiteExome Aggregation ConsortiumGenome Aggregation DatabaseHomo Sapiens Genome AssemblyGene Function TableOrtholog PredictionFlyBasePhenotypic Information

This article has been published

Video Coming Soon

JoVE Logo

Privacy

Terms of Use

Policies

Research

Education

ABOUT JoVE

Copyright © 2025 MyJoVE Corporation. All rights reserved