Bu içeriği görüntülemek için JoVE aboneliği gereklidir. Oturum açın veya ücretsiz deneme sürümünü başlatın.
Method Article
Here, we present a protocol to access and analyze many human and model organism databases efficiently. This protocol demonstrates the use of MARRVEL to analyze candidate disease-causing variants identified from next-generation sequencing efforts.
Through whole-exome/genome sequencing, human geneticists identify rare variants that segregate with disease phenotypes. To assess if a specific variant is pathogenic, one must query many databases to determine whether the gene of interest is linked to a genetic disease, whether the specific variant has been reported before, and what functional data is available in model organism databases that may provide clues about the gene’s function in human. MARRVEL (Model organism Aggregated Resources for Rare Variant ExpLoration) is a one-stop data collection tool for human genes and variants and their orthologous genes in seven model organisms including in mouse, rat, zebrafish, fruit fly, nematode worm, fission yeast, and budding yeast. In this Protocol, we provide an overview of what MARRVEL can be used for and discuss how different datasets can be used to assess whether a variant of unknown significance (VUS) in a known disease-causing gene or a variant in a gene of uncertain significance (GUS) may be pathogenic. This protocol will guide a user through searching multiple human databases simultaneously starting with a human gene with or without a variant of interest. We also discuss how to utilize data from OMIM, ExAC/gnomAD, ClinVar, Geno2MP, DGV and DECHIPHER. Moreover, we illustrate how to interpret a list of ortholog candidate genes, expression patterns, and GO terms in model organisms associated with each human gene. Furthermore, we discuss the value protein structural domain annotations provided and explain how to use the multiple species protein alignment feature to assess whether a variant of interest affects an evolutionarily conserved domain or amino acid. Finally, we will discuss three different use-cases of this website. MARRVEL is an easily accessible open access website designed for both clinical and basic researchers and serves as a starting point to design experiments for functional studies.
The use of next-generation sequencing technology is expanding in both research and clinical genetic laboratories1. Whole-exome (WES) and whole-genome sequencing (WGS) analyses reveal numerous rare variants of unknown significance (VUS) in known disease-causing genes as well as variants in genes that are yet to be associated with a Mendelian disease (GUS: genes of uncertain significance). Presented with a list of genes and variants in a clinical sequence report, medical geneticists must manually visit multiple online resources to obtain more information to assess which variant may be responsible for a certain phenotype seen in the patient of interest. This process is time-consuming, and its efficacy is highly dependent on the expertise of the individual. Although several guideline papers have been published2,3, interpretation of WES and WGS requires manual curation since there is yet to be a standardized methodology for variant analysis. For the interpretation of VUS, knowledge on the previously reported genotype-phenotype relationship, mode of inheritance, and allele frequencies in the general population become valuable. In addition, knowledge on whether the variant affects a critical protein domain, or an evolutionarily conserved residue may increase or decrease the likelihood of pathogenicity. To gather all of this information, one typically needs to navigate through 10-20 human and model organism databases since the information is scattered through the World Wide Web.
Similarly, model organism scientists who work on specific genes and pathways are often interested in connecting their findings to human disease mechanisms and wish to take advantage of the knowledge that is being generated in the human genomics field. However, due to the rapid expansion and evolution of data sets regarding the human genome, it has been challenging to identify databases that provide useful information. In addition, since most model organism databases are designed for researchers who work with the specific organism on a daily basis, it is very difficult, for example, for a mouse researcher to search for specific information in a Drosophila database and vice versa. Similar to the variant interpretation searches performed by medical geneticists, identifying useful human and other model organism information is time-consuming and heavily dependent on the background of the model organism researcher. MARRVEL (Model organism Aggregated Resources for Rare Variant ExpLoration)4 is a tool designed for both groups of users to streamline their workflow.
MARRVEL (http://marrvel.org) was designed as a centralized search engine that collects data systematically in an efficient and consistent manner for clinicians and researchers. With information from 20 or more publicly available databases, this program allows users to quickly gather information and access a large number of human and model organism databases without reiterative searches. The search result pages also contain hyperlinks to the original sources of information, allowing individuals to access the raw data and gather additional information provided by the sources.
In contrast to many of the variant prioritization tools that require large sequencing data input in the form of VCF or BAM files and installations of often proprietary/commercial software, MARRVEL operates on any web-browser. It can be used at no cost and compatible with portable devices (e.g. smartphones, tablets) as long as one is connected to the internet. We chose this format since many clinicians and researchers typically need to search one or a few genes and variants at a time. Note that we are developing batch-download and API (application programming interface) features for MARRVEL to eventually allow users to curate hundreds of genes and variants at a time through customized query tools if necessary.
Due to the wide range of applications, in this protocol, we will describe a broadly encompassing approach on how to navigate through different datasets that MARRVEL displays. More targeted examples that are tailored towards specific users’ needs will be described in Representative Results section. It is important to note that the output of MARRVEL still requires a certain level of background knowledge in either human genetics or model organisms to extract valuable information. We refer the readers to the table that lists primary papers that describe the function of each of the original databases that are curated by MARRVEL (Table 1). The following protocol is divided into three sections: (1) How to begin a search, (2) how to interpret MARRVEL human genetics outputs, and (3) how to make use of model organism data in MARRVEL. In the Representative Results section, more focused and specific approaches are described. MARRVEL is being actively updated so please refer to the current website’s FAQ page for details about data sources. We strongly recommend the users of MARRVEL to sign up in order to receive update notifications through the e-mail submission form at the bottom of the MARRVEL home page.
1. How to begin a search
2. How to interpret MARRVEL human genetics outputs for a gene and variant search
NOTE: On the results page, there are seven human databases that are displayed (Table 1, Figure 1). For each output box, there is an External link button (small box with a diagonal arrow) on the upper right-hand corner that will link to the original database for more details.
3. How to use model organism data in MARRVEL
Human geneticists and model organism scientists each use MARRVEL in distinct ways, each with different desired outcomes. Below are three vignettes of possible uses for MARRVEL.
Evaluating pathogenicity of a variant in a dominant disease
Most of the users that visit MARRVEL use this website to analyze the likelihood that a rare human variant may cause a certain disease. For example, a missense (17:59477596 G>A, p.R20Q) variant in TBX2 was found to segregate i...
Critical steps in this protocol include the initial input (steps 1.1-1.3) and subsequent interpretation of the output. The most common reason why search results are negative is because of the many ways that a gene and/or variant can be described. While MARRVEL is updated on a scheduled basis, these updates may cause disconnects between the different databases that MARRVEL links to. Thus, the first step in troubleshooting is invariably checking to see if alternative names of the gene or variant will lead to a successful s...
The authors have nothing to disclose.
We thank Drs. Rami Al-Ouran, Seon-Young Kim, Yanhui (Claire) Hu, Ying-Wooi Wan, Naveen Manoharan, Sasidhar Pasupuleti, Aram Comjean, Dongxue Mao, Michael Wangler, Hsiao-Tuan Chao, Stephanie Mohr, and Norbert Perrimon for their support in the development and maintenance of MARRVEL. We are grateful to Samantha L. Deal and J. Michael Harnish for their input on this manuscript.
The initial development of MARRVEL was supported in part by the Undiagnosed Diseases Network Model Organisms Screening Center through the NIH Commonfund (U54NS093793) and through the NIH Office of Research Infrastructure Programs (ORIP) (R24OD022005). JW is supported by the NIH Eunice Kennedy Shriver National Institute of Child Health & Human Development (F30HD094503) and The Robert and Janice McNair Foundation McNair MD/PhD Student Scholar Program at BCM. HJB is further supported by the NIH National Institute of General Medical Sciences (R01GM067858) and is an Investigator of the Howard Hughes Medical Institute. ZL is supported by the NIH National Institute of General Medical Science (R01GM120033), National Institute of Aging (R01AG057339), and the Huffington Foundation. SY received additional support from the NIH National Institute on Deafness and other Communication Disorders (R01DC014932), the Simons Foundation (SFARI Award: 368479), the Alzheimer’s Association (New Investigator Research Grant: 15-364099), Naman Family Fund for Basic Research and Caroline Wiess Law Fund for Research in Molecular Medicine.
Name | Company | Catalog Number | Comments |
Human Genetics | ClinVar | PMID: 29165669 | https://www.ncbi.nlm.nih.gov/clinvar/ |
Human Genetics | DECIPHER | PMID: 19344873 | https://decipher.sanger.ac.uk/ |
Human Genetics | DGV | PMID: 24174537 | http://dgv.tcag.ca/dgv/app/home |
Orthology Prediction | DIOPT | PMID: 21880147 | https://www.flyrnai.org/cgi-bin/DRSC_orthologs.pl |
Human Gene/Transcript Nomenclature | Ensembl | PMID: 29155950 | https://useast.ensembl.org/ |
Human Genetics | ExAC | PMID: 27535533 | http://exac.broadinstitute.org/ |
Primary Model Organism Databases | FlyBase (Drosophila) | PMID:26467478 | http://flybase.org |
Model Organism Database Integration Tools | Gene2Function | PMID: 28663344 | http://www.gene2function.org/search/ |
Human Genetics | Geno2MP | N/A | http://geno2mp.gs.washington.edu/Geno2MP/ |
Human Genetics | gnomAD | PMID: 27535533 | http://gnomad.broadinstitute.org/ |
Gene Ontology | GO Central | PMID: 10802651, 25428369 | http://www.geneontology.org/ |
Human Gene/Protein Expression | GTEx | PMID: 29019975, 23715323 | https://gtexportal.org/home/ |
Human Gene Nomenclature | HGNC | PMID: 27799471 | https://www.genenames.org/ |
Primary Model Organism Databases | IMPC (mouse) | PMID: 27626380 | http://www.mousephenotype.org/ |
Primary Model Organism Databases | MGI (mouse) | PMID:25348401 | http://www.informatics.jax.org/ |
Model Organism Database Integration Tools | Monarch Initiative | PMID: 27899636 | https://monarchinitiative.org/ |
Human Variant Nomenclature | Mutalyzer | PMID: 18000842 | https://mutalyzer.nl/ |
Human Genetics | OMIM | PMID: 28654725 | https://omim.org/ |
Primary Model Organism Databases | PomBase (fission yeast) | PMID:22039153 | https://www.pombase.org/ |
Literature | PubMed | N/A | https://www.ncbi.nlm.nih.gov/pubmed/ |
Primary Model Organism Databases | RGD (rat) | PMID:25355511 | https://rgd.mcw.edu/ |
Primary Model Organism Databases | SGD (budding yeast) | PMID: 22110037 | https://www.yeastgenome.org/ |
Human Gene/Protein Expression | The Human Protein Atlas | PMID: 21752111 | https://www.proteinatlas.org/ |
Primary Model Organism Databases | WormBase (C. elegans) | PMID:26578572 | http://wormbase.org |
Primary Model Organism Databases | ZFIN (zebrafish) | PMID:26097180 | https://zfin.org/ |
Bu JoVE makalesinin metnini veya resimlerini yeniden kullanma izni talebi
Izin talebiThis article has been published
Video Coming Soon
JoVE Hakkında
Telif Hakkı © 2020 MyJove Corporation. Tüm hakları saklıdır