The overall goal of targeted next-generation sequencing is to elucidate the genetic determinants of various constitutional diseases by focusing on genomic regions of particular interest. This method can help us answer key questions regarding a disease's genetic ideology, particularly when there are previously known genetic associations. This technique is highly efficient.
It produces millions of reads in a short period of time, we get them at a relatively low cost, there is relatively low computational burden, especially when we compare it with other next-generation sequencing approaches. The implications of the technique extend toward the diagnosis of neurodegenerative diseases, which are phenotypically and genetically heterogenous, but have many known associated genetic loci. Though this method is particularly directed towards neurodegenerative diseases, it could also be applied to various other constitutional diseases with previously identified genomic regions of interest.
Generally, individuals new to this method will struggle because the bioinformatic processing required for final rare-variant analysis can be computationally intensive and create many sources of error. In this procedure, human blood samples are collected in three, four milliliter EDTA K2 tubes to provide a total volume of about 12 milliliters. Centrifuge the blood samples at 750 times gravity for 20 minutes.
This will fraction each sample into an upper phase of plasma, a thin middle phase of leukocytes, and a bottom phase of erythrocytes. Pipette the plasma off of the top of the blood sample using a disposable transfer pipette. Dispense into multiple 500 microliter aliquots and store at negative 80 degrees Celsius for future biochemical analysis.
Extract DNA from the blood sample with a blood extraction kit according to the manufacturer's instructions. The extracted DNA is then used to prepare a sequencing library for next-generation sequencing. Once the sequencing run is complete, on a computer, find the files within the cloud-based computing environment by selecting Runs on the navigation panel.
Select the appropriate sequencing run to navigate to the run summary page. Select Download to obtain data from the cloud. From the dialog box that appears, select the FASTQ files as the file type to download and click Download.
From the Run Summary page of the cloud-based computing environment, navigate to Charts to analyze the quality of the sequencing run with the various figures produced by the computing environment. From the Run Charts page, find the figure labeled Data By Cycle, under Chart select Intensity, and under Channel select All Channels to produce the signal intensity plot. Within the Run Navigation panel, select the Indexing QC tab to find the Indexing Quality Control histogram, which is on the right-hand side of the page.
From the Run Summary page of the cloud-based computing environment, click Metrics within the Run navigation panel to navigate to the quality metrics. Under Density kelvin per millimeter squared, ensure the cluster density of the sequencing run is within the range recommended by the enrichment kit being used. In this case, 1, 200 to 1, 400 kelvin per millimeter squared.
Under the Total Percent greater than or equal to Q30, ensure that the value is greater than or equal to 85%reflecting the quality of the sequencing reads. Under Aligned ensure that the value is similar to the percentage of positive control that was included in the sequencing run. For example, if 1%positive control was used, the expected aligned percent would be approximately 1 to 5%and variations within a few decimal points are acceptable.
Begin this process by importing FASTQ sequencing reads into the data processing software. Within the navigation area, right click and select New Folder. Name the folder such that there is clarity as to the sequencing run that was performed.
From the toolbar at the top, select Import and from the dropdown list, choose the platform with which the sequencing was performed. For the purposes of ONDRISeq, Ilumina is chosen. In the dialog box, navigate to and select the FASTQ files from the sequencing run that is being processed.
From the General options of the dialog box, click the box beside Paired reads if sequencing used paired end chemistries. From the Paired read information of the dialog box, select Paired-end if the forward read FASTQ file appears before the reverse read in the file list. Set the Paired read minimum distance to one and maximum distance to 1000.
From the Ilumina options of the dialog box, select Remove failed reads. From the Quality score drop down list, select the NGS pipeline that was utilized for sequencing. Select Next at the bottom of the dialog box.
Select Save and Create subfolders per batch unit. Select Next at the bottom of the dialog box. Choose the folder that was created earlier.
This is where the FASTQ files will be imported. Select Finish at the bottom of the dialog box and wait until the FASTQ files are imported. Click the Processes tab to see the status of the file import.
Next, design a workflow within the software to perform resequencing and variant calling according to the manufacturer's instructions. Designing the resequencing and variant calling workflow is the most difficult aspect of this procedure. Our team researched best practices and used trial and error to come up with the most robust workflow to fit our needs.
To run the imported FASTQ sequencing read files through the customized bioinformatics workflow, start by identifying the workflow in the software's toolbox and double-clicking it. Within the dialog box that appears, locate the folders of FASTQ files that were imported within the navigation area. Highlight all folders by selecting them within the navigation area and then click the box beside Batch.
Use the right-facing arrow to move the files to selected elements. Click Next at the bottom of the dialog box. Within the dialog box, review the batch overview to ensure the correct FASTQ files were selected and then click Next.
Review the steps of the workflow within the dialog box to ensure the correct files and export locations were selected when designing the workflow. These steps include mapping reads to the reference sequence, removing duplicate mapped reads, creating statistics for target regions, exporting BAM files, exporting Tab delimited text, filter based on overlap, and exporting VCF files. Within the final step in the dialog box, Result handling, select the option Save in input folder.
Click Finish at the bottom of the dialog box. The final step is to perform variant annotation upon the VCF file of each sample as described in the text protocol. The methodologies demonstrated in this video were applied to 528 participant DNA samples from individuals that have been enrolled in ONDRI.
Samples were run on the ONDRISeq panel in 22 runs of 24 samples per run. Overall, sequencing data were determined to be of high quality with a mean sample coverage of 78 times. A mean 95.6%of reads were matched to the reference sequence, and all ONDRISeq runs had greater than 90%of reads mapped.
Of the mapped reads, 92.0%and a Phred Score greater than or equal to Q30. To demonstrate the utility of this targeted NGS workflow, the example of a 68-year old male, Parkinson's disease patient is presented, showing a reduced N over output. Annotated variance are curated to identify those that are most likely to be of clinical significance, as denoted by the red boxes.
When performing this procedure, it is important to remember to appropriately cater the steps to the sequencing platform, the enrichment kit that was used and to the needs of the research. After its development, this technique paved the way for researchers in the field of genetics to obtain high quality, region specific data while remaining less expensive than its whole exome and whole genome counterparts.