로그인

Hybrid De Novo Genome Assembly for the Generation of Complete Genomes of Urinary Bacteria using Short- and Long-read Sequencing Technologies

4.4K Views

•

12:08 min

•

August 20th, 2021

DOI :

10.3791/62872-v

August 20th, 2021

•

4,438 Views

Belle M. Sharon¹, Neha V. Hulyalkar¹, Vivian H. Nguyen¹, Philippe E. Zimmern², Kelli L. Palmer¹, Nicole J. De Nisco¹

¹Department of Biological Sciences, University of Texas at Dallas, ²Department of Urology, University of Texas Southwestern Medical Center

챕터

필기록

This comprehensive protocol instructs how to generate close genomes of diverse urinary bacteria and enables readers with no coding experience to successfully implement terminal-based genome assembly and analysis pipelines. Complete genome assemblies enable the discovery of genes contributing to colonization and virulence among commensal and uropathogenic bacteria. Close genomes provide insight into genome architecture and mobile genetic elements.

The hybrid genome assembly method can be applied to any culturable bacteria isolated from other anatomical sites or environmental niches to provide similar insights into genome content and architecture. Follow the protocol and manufacturer guide to prepare a double-stranded DNA pool and proceed to clean up by adding 2.5 times the volume of paramagnetic beads to the DNA pool, and then gently flicking the tube to mix the contents. Briefly spin down the sample at 2, 000 times G and place the sample on a rotator for five minutes at room temperature, then pellet the beads on the magnet.

Next, wash the pellet twice with 250 microliters of freshly prepared 70%ethanol in nuclease-free water without disturbing the pellet. Leave the tube on the magnet during washes. Aspirate ethanol after each wash, then briefly spin down the sample at 2, 000 times G before returning it to the magnet.

Pipette off residual ethanol and allow the pellet to dry for 30 seconds. Once dried, resuspend the pellet in 60 to 70 microliters of nuclease-free water and incubate the sample at room temperature for two minutes, then pellet the sample on the magnet until the elute is clear. Transfer the elute into a clean 1.5 milliliter microcentrifuge tube.

Quantify the concentrated double-stranded DNA pool using a fluorometer, then prepare an aliquot of 700 nanograms of the sample in 65 microliter final volume to proceed to the adapter ligation step and retain the remainder of the pool at four degrees Celsius for a second run after the first run is finished. Continue with adapter ligation according to manufacturer instructions. To proceed with sequencing, mix the priming buffers and prime the flow cell.

Prepare the DNA library solution and load the sample dropwise onto the loading port. Start the sequencing run. Open the operating software for sequencing and click on the start button, then input a name for the experiment.

Go to continue to kit selection to select the appropriate library prep kit and barcode expansion pack used and then click on continue to run options. If planning to prepare a sufficient library for a second run, adjust the run length from default 72 hours to 48 hours and press continue to basecalling. Check the basecalling option in config fast basecalling.

Set barcoding to enabled so that output FASTQ will be trimmed off the barcode sequences and de-multiplexed into separate directories based on barcode. Click on continue to output. Next, choose where to save output sequencing data.

Expect approximately 30 to 50 gigabytes of data if only saving FASTQ output and more than 500 gigabytes of data if saving FAST5 output. If planning to proceed with manual filtering after sequencing, uncheck the filtering option Qscore seven in read length unfiltered. Otherwise, leave the option checked.

Go to continue to run setup tab to review all the settings. If the settings are correct, click on the start button. Else, click on back to make any necessary adjustments.

In the long reads directory, create new directories for each barcode used in the run. Copy all fastq files that correspond to each barcode into the appropriate folder. Combine all fastq files for each barcode from every run.

Once the files are combined, open the terminal and navigate to the barcode directories within the long reads directory using the cd command. To concatenate all fastq files per barcode into a single fastq file, execute the command and repeat the procedure for each barcode. Use NanoStat to assess the read quality of the sample by executing the command and then record the results by copying the output into a text or Word file for future reference.

After the results are saved, use NanoFilt to filter MinION reads, discarding reads with Q less than seven and length less than 200 by executing the command. On the generated trimmed file, run NanoStat with the command. Record the results by copying the output into a text or Word file and compare to the previous results to ensure that the filtering was successful.

Repeat the steps for each barcode used in the sequencing run. For generating hybrid genome assembly, organize the short read files and long read files into a single directory named trimmed_reads. So the directory contains a previously generated fastq.

gz file for trimmed long reads and two fastq. gz files as R1 and R2 for trimmed short reads. Navigate to the directory trimmed_reads that stores the read files using the cd command in terminal.

Once in the correct directory, zip the two short read files to store in the fastq. gz format by executing the command. Repeat the step for both R1 and R2.Ensure all the read files are in the fastq.

gz format and verify that all the files match the same isolate. Begin the hybrid assembly using Unicycler by running the command. When the run is complete, review the unicycler.

log file to ensure no errors. Record the number, size, and complete or incomplete status of the context generated. If incomplete contexts are identified in the Unicycler log, rerun Unicycler in bold mode by adding the flag to the command.

Open Bandage and click on file, then choose load graph and select the assembly. gfa file that was earlier saved to the unicycler_output_directory generated by Unicycler. Once the file is loaded, click on the draw graph button on the left-hand toolbar and evaluate if the assembly is complete by examining how the contigs are connected and organized.

Navigate within the terminal to the folder that stores the Unicycler output using the cd command. Then run quast protocol by executing the command. Review the reports generated by quast in the output directory quast_output_directory.

Navigate within the terminal to the folder that stores the Unicycler output using the cd command, then run prokka by executing the command. Review the annotations by opening the tsv table and by uploading the gff file generated into the sequence analysis software. Visualize and analyze the annotations in the software.

In the representative gel images, genomic DNA or gDNA extraction outcomes are depicted. Successful extraction and separation of intact gDNA can be observed. Unsuccessful extraction showing fragmented gDNA and RNA contamination are also depicted.

The genome assembly graphs of bacteria were generated by Unicycler hybrid de novo assembly and visualized by Bandage. The complete genome of Klebsiella oxytoca KoPF10 demonstrated a single closed chromosome and the complete genome of Klebsiella pneumoniae KpPF25 consisted of a closed chromosome and five closed plasmids. The incomplete chromosome of Klebsiella pneumoniae KpPF46 consisted of two interconnected contigs.

The assembly maps generated by Geneious Prime for the complete genome of Klebsiella oxytoca KoPF10 and Klebsiella pneumoniae KpPF25 showing annotated genes denoted by colored arrows along plasmid backbones. To ensure flow cell loading is successful, filter long reads using the recommended settings and ensure all files are in the fastq. gz format.

Run the correct Unicycler command using files belonging to the same sample.

요약

더 많은 비디오 탐색

Keywords Hybrid Genome Assembly

Urinary Bacteria

Short read Sequencing

Long read Sequencing

Complete Genomes

Genome Architecture

Mobile Genetic Elements

Culturable Bacteria

Genome Content