Amino acid-level signal-to-noise analysis, provides a measure of the likelihood that a genetic variant is associated with a disease state, or is part of natural genetic variation within the population. This technique leverages two large genetic resources, disease associated mutations, available in the literature, or in the public domain with population based exsome and genome studies, that identify rare genetic variance. To identify a specific gene and splice isoform of interest, open the Ensembl homepage and, select the species from the dropdown menu.
Enter the acronym for the gene of interest, and click go. Select the links corresponding to the gene of interest and the transcriptive interest, and the ID of interest from the Transcript table. Note the transcript specific RNA transcript and protein product of RNA transcript identification numbers in the reference sequence column of the transcript table for future reference.
Select the link associated with the protein product of RNA transcript ID number to open a new webpage from the National Center for Biotechnology Information, or NCBI Protein Database, and scroll down to the Origin section to obtain the primary protein sequence for the gene transcript of interest. Then scroll up to the Features section to obtain a list of the protein features. To calculate a minor allele frequency for each amino acid position with a control variant, open a graphing capable spreadsheet and create a column of the positions of all of the experimental variants.
Remove the variant texts to leave only the variant positions and sort the variants in ascending value to identify which positions have more than one associated variant. Obtain the sum of all of the minor allele frequencies for a given position, by combining the minor allele frequency for each variant associated with a given position, and calculate a minor allele frequency for each amino acid position with an experimental variant. Next, create a column of amino acid positions that have experimental variants, and calculate the minor allele frequency of all of the variants associated with that position for all of the variant positions.
To create a rolling average of the minor allele frequencies for both the experimental and control variants, create a column containing all of the amino acid positions in the gene of interest, and add a minor allele frequency of zero for all of the positions that do not have variants for both the control and experimental data sets. To create a rolling average for each experimental and control prevalence column, create a column representing a rolling average of the minor allele frequency for both the control and experimental data sets, and in the rolling average column place the average of the respective minor allele frequency for the five variant positions N terminal and C terminal to each position. To calculate the cohort minimum frequency, divide the lowest minor allele identified by two and enter this value in any cell with the control minor allele frequency of zero.
This will avoid dividing by zero, when calculating the signal-to-noise ratio. To calculate the amino acid-level signal-to-noise ratio, divide each amino acid position experimental rolling average by the respective control rolling average, and graph this ratio versus the amino acid position. To identify the consensus amino acid locations of functional domains and features, or areas of post translational modification of the protein of interest, identify the amino acid positions associated with the protein domains and features and open the NCBI webpage.
Enter the protein product of the RNA transcript of the protein of interest into the search field, and identify the known protein domains and features, under Features. Identify and note the domain name and type and the amino acid positions and select the link corresponding to the feature to visualize the region on the protein of interest primary sequence. Create a column next to the signal-to-noise column, so that the amino acid position column can be referenced, and identify the cells corresponding at the N or C terminal aspect of each domain and feature.
Then, place a one in each cell, create a graph with these boundaries on the y-axis and the amino acid position on the x-axis and overlay this graph with the signal-to-noise graph. To map individual variant positions for overlay of the signal-to-noise ratio and protein domain topology graphs, create a column next to the domain feature column such that rows in the column, correspond to the amino acid positions, and place a one in each cell in the added row, corresponding to a position containing a respective variant. Then, create a graph with this column as the y-axis and the amino acid positions on the x-axis, and overlay this graph with the signal-to-noise and protein domain topology graphs.
Here, a representative result for an amino acid-level signal-to-noise analysis for the potassium voltage gated channel subfamily Q member one gene is depicted. Rare variance identified in the control cohort, and the experimental incidentally identified whole exsome sequencing, and long QT syndrome case associated variants deemed likely to be disease associated are shown. The signal-to-noise analyses comparing the whole exsome sequencing, and the long QT syndrome cohort variant frequency, normalized against the control cohort variant frequency, are also represented.
In this experiment, the long QT syndrome associated variants demonstrated high signal-to-noise ratios in domains corresponding with the channel pour, the selectivity filter, and the potassium voltage gated channel subfamily E member one binding domain. In comparison, incidentally identified variants in the whole exsome sequencing cohort, did not clearly demonstrate specific regions of high signal-to-noise elevation, suggesting that these variants reflect the background genetic variation. This methodology could be applied to gauge the diagnostic weight of variants of unknown significance that arise during clinical genetic testing.