We use virus genetic sequencing to try and understand where viruses come from and how they spread. And in order to use that data in real time, for instance during an outbreak, we have to understand exactly how accurate the data is that we provide. The main advantages of these techniques are that it's relatively cheap, it's fast, and it enables real-time base coding which can be used for in-field sequencing.
This method can be used for the timely generation of sequence data which, for example, can be used to determine the origin of a multitude of pathogens. This technique can be complex for first time users and basic Unix knowledge is required. Especially for installation of the different software packages.
The visualization of this method will help users to determine the error rate of their specific sequencing protocols. And to answer research questions. The sequence run is started on a MinION connected to the MinIT.
After preforming a sequence on the Nanopore platform, load the data load into Porechop. To prevent contamination and to enhance accuracy, use the require underscore two, underscore barcodes flag and enter the command as indicated. After de-multiplexing, use cutadapt to remove primer sequences and sequences with a length shorter than 75 nucleotides using the command as indicated.
Use Minimap2 to map de-multiplex sequence reads against the panel of distinct reference strains using Minimap2. And use Samtools to generate a consensus sequence. For reference based alignments, it is essential to preform a blast and search with the generated consensus sequence to identify the closest reference strain.
Then repeat the reference based alignment with the closest reference strain as the reference. To determine the required read coverage to compensate for the error profile in Nanopore sequencing, select the sequence mapping to one Amplicon and use Minimap2 to map the Nanopore reads against this Amplicon. Use Samtools to select only the reads mapping to Amplicon26 and to convert the bam file into fastq using the commands as indicated.
Randomly select subsets of, for example, 200 sequence reads one thousand times. All randomly selected sequence reads are aligned to Amplicon26. Use KMA to map the sequence reads and to immediately generate a consensus sequence using the optimized settings for Nanopore sequencing as indicated by the BC nano flag.
To inspect the generated consensus sequences use the commands as indicated. To display the error rate in the stats dot txt under the heading, error rate mismatches bases mapped, use the command as indicated. To display the number of indels is displayed under the heading indels per cycle use the command as indicated.
Using the newer flow cell in combination with the base color flip-flop, a read coverage of 40 times results in identical results as compared to Illumina sequencing. A read coverage of 30 times results in an error rate of 0.02%which corresponds to one error in every 585, 000 nucleotides sequenced. While a read coverage of 20 times results in one error in every 63, 529 nucleotides sequenced.
A read coverage of 10 times results in one error in every 3, 312 nucleotides sequenced. Meaning that over 3 nucleotides per four Usutu virus genome are being called wrong. With a read coverage above 30 times, no indels are observed.
A read coverage of 20 times results in the detections of one indel position while a read coverage of 10 times results in indels in 29 positions. The most important thing to remember is that, although not always apparent, updates in software occur frequently and can influence the results. There are several options of softwares to use.
We have demonstrated the most commonly used programs. But different software and different versions of those softwares may result in different outcomes. We've shown you an example of how we check the quality of the sequence data that we generate.
And that's critical for reliable outbreak support.