Sign In

A subscription to JoVE is required to view this content. Sign in or start your free trial.

In This Article

  • Summary
  • Abstract
  • Introduction
  • Protocol
  • Representative Results
  • Discussion
  • Acknowledgements
  • Materials
  • References
  • Reprints and Permissions

Summary

Amino acid-level signal-to-noise analysis determines the prevalence of genetic variation at a given amino acid position normalized to background genetic variation of a given population. This allows for identification of variant "hotspots" within a protein sequence (signal) that rises above the frequency of rare variants found in a population (noise).

Abstract

Advancements in the cost and speed of next generation genetic sequencing have generated an explosion of clinical whole exome and whole genome testing. While this has led to increased identification of likely pathogenic mutations associated with genetic syndromes, it has also dramatically increased the number of incidentally found genetic variants of unknown significance (VUS). Determining the clinical significance of these variants is a major challenge for both scientists and clinicians. An approach to assist in determining the likelihood of pathogenicity is signal-to-noise analysis at the protein sequence level. This protocol describes a method for amino acid-level signal-to-noise analysis that leverages variant frequency at each amino acid position of the protein with known protein topology to identify areas of the primary sequence with elevated likelihood of pathologic variation (relative to population "background" variation). This method can identify amino acid residue location "hotspots" of high pathologic signal, which can be used to refine the diagnostic weight of VUSs such as those identified by next generation genetic testing.

Introduction

The rapid improvement in genetic sequencing platforms has revolutionized the accessibility and role of genetics in medicine. Once confined to a single gene, or a handful of genes, the reduction in cost and increase in speed of next generation genetic sequencing has led routine sequencing of the entirety of the genome's coding sequence (whole exome sequencing, WES) and the entire genome (whole genome sequencing, WGS) in the clinical setting. WES and WGS have been used frequently in the setting of critically ill neonates and children with concern for genetic syndrome where it is a proven diagnostic tool that can change clinical management1

Protocol

1. Identify the Gene and Specific Splice Isoform of Interest

NOTE: Here, we demonstrate the use of Ensembl15 to identify the consensus sequence for the gene of interest which is associated with the pathogenesis of the disease of interest (i.e. KCNQ1 mutations are associated with LQTS). Alternatives to Ensembl include RefSeq via the National Center for Biotechnology Information (NCBI)16 and the University of California, Santa Cruz (UCSC) Hu.......

Representative Results

A representative result for amino acid-level signal to noise analysis for KCNQ1 is depicted in Figure 6. In this example, rare variants identified in the GnomAD cohort (control cohort), incidentally-identified WES variants (experimental cohort #1), and LQTS case-associated variants deemed likely disease-associated (experimental cohort #2) are depicted. Further, the signal-to-noise analysis comparing the WES and LQTS cohort variant frequency normalized against.......

Discussion

High-throughput genetic testing has advanced dramatically in its application and availability over the past decade. However, in many diseases with well-established genetic underpinnings, such as cardiomyopathies, expanded testing has failed to improve diagnostic yield21. Further, there is significant uncertainty regarding the diagnostic utility of many identified variants. This is partially due to a growing number of incidentally identified rare variants discovered on WES and WGS, which can lead t.......

Acknowledgements

APL is supported by the National Institutes of Health K08-HL136839.

....

Materials

NameCompanyCatalog NumberComments
1000 Genome ProjectN/Awww.internationalgenome.org
ClinVarN/Awww.ncbi.nlm.nih.gov/clinvar
Ensembl Genome BrowserN/Auswest.ensembl.org/index.html
ExcelMicrosoftoffice.microsoft.com/excel/Used for all example formulas and functions
Exome Aggregation Consortium N/Awww.exac.broadinstitute.org
Genome Aggregation Database N/Awww.gnomad.broadinstitute.org
National Center for Biotechnology Information Domain and Structure DatabaseN/Awww.ncbi.nlm.nih.gov/guide/domains-structures/
National Center for Biotechnology Information Gene DatabaseN/Awww.ncbi.nlm.nih.gov/gene/
National Center for Biotechnology Information Protein DatabaseN/Awww.ncbi.nlm.nih.gov/protein/
National Heart, Lung, and Blood Institute GO Exome Sequencing ProjectN/Awww.evs.gs.washington.edu/EVS/
SnapGeneGSL Biotech LCCwww.snapgene.com
University of California, Santa Cruz Human Genome BrowserN/Awww.genome.ucsc.edu

References

  1. Yang, Y., et al. Clinical whole-exome sequencing for the diagnosis of mendelian disorders. New England Journal of Medicine. 369 (16), 1502-1511 (2013).
  2. Meng, L., et al.

Explore More Articles

Amino AcidSignal to noise AnalysisGenetic VariationVariant PathogenicityDisease associated MutationsPopulation based Exome genome StudiesEnsemblGene IDNCBI Protein DatabaseMinor Allele FrequencyRolling Average

This article has been published

Video Coming Soon

JoVE Logo

Privacy

Terms of Use

Policies

Research

Education

ABOUT JoVE

Copyright © 2024 MyJoVE Corporation. All rights reserved