JoVE Logo
Faculty Resource Center

Sign In

Summary

Abstract

Introduction

Protocol

Representative Results

Discussion

Acknowledgements

Materials

References

Genetics

A Virtual Machine Platform for Non-Computer Professionals for Using Deep Learning to Classify Biological Sequences of Metagenomic Data

Published: September 25th, 2021

DOI:

10.3791/62250

1Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, 2Center for Quantitative Biology, Peking University, 3State Key Laboratory of Organ Failure Research, Southern Medical University

This tutorial describes a simple method to construct a deep learning algorithm for performing 2-class sequence classification of metagenomic data.

A variety of biological sequence classification tasks, such as species classification, gene function classification and viral host classification, are expected processes in many metagenomic data analyses. Since metagenomic data contain a large number of novel species and genes, high-performing classification algorithms are needed in many studies. Biologists often encounter challenges in finding suitable sequence classification and annotation tools for a specific task and are often not able to construct a corresponding algorithm on their own because of a lack of the necessary mathematical and computational knowledge. Deep learning techniques have recently become a popular topic and show strong advantages in many classification tasks. To date, many highly packaged deep learning packages, which make it possible for biologists to construct deep learning frameworks according to their own needs without in-depth knowledge of the algorithm details, have been developed. In this tutorial, we provide a guideline for constructing an easy-to-use deep learning framework for sequence classification without the need for sufficient mathematical knowledge or programming skills. All the code is optimized in a virtual machine so that users can directly run the code using their own data.

The metagenomic sequencing technique bypasses the strain isolation process and directly sequences the total DNA in an environmental sample. Thus, metagenomic data contain DNA from different organisms, and most biological sequences are from novel organisms that are not present in the current database. According to different research purposes, biologists need to classify these sequences from different perspectives, such as taxonomic classification1, virus-bacteria classification2,3,4, chromosome-plasmid classification3,....

Log in or to access full content. Learn more about your institution’s access to JoVE content here

1. The installation of the virtual machine

  1. Download the virtual machine file from (https://github.com/zhenchengfang/DL-VM).
  2. Download the VirtualBox software from https://www.virtualbox.org.
  3. Decompress the ".7z" file using related software, such as "7-Zip", "WinRAR" or "WinZip".
  4. Install the VirtualBox software by clicking the Next button in each step.
  5. Open the VirtualBox software and click the New but.......

Log in or to access full content. Learn more about your institution’s access to JoVE content here

In our previous work, we developed a series of sequence classification tools for metagenomic data using an approach similar to this tutorial3,11,12. As an example, we deposited the sequence files of the subset of training set and test set from our previous work3,11 in the virtual machine.

Fang & Zhou11 aimed to iden.......

Log in or to access full content. Learn more about your institution’s access to JoVE content here

This tutorial provides an overview for biologists and algorithm design beginners on how to construct an easy-to-use deep learning framework for biological sequence classification in metagenomic data. This tutorial aims to provide intuitive understanding of deep learning and address the challenge that beginners often have difficulty installing the deep learning package and writing the code for the algorithm. For some simple classification tasks, users can use the framework to perform the classification tasks.

Log in or to access full content. Learn more about your institution’s access to JoVE content here

This investigation was financially supported by the National Natural Science Foundation of China (81925026, 82002201, 81800746, 82102508).

....

Log in or to access full content. Learn more about your institution’s access to JoVE content here

Name Company Catalog Number Comments
PC or server NA NA Suggested memory: >6GB
VirtualBox software NA NA Link: https://www.virtualbox.org

  1. Liang, Q., Bible, P. W., Liu, Y., Zou, B., Wei, L. DeepMicrobes: taxonomic classification for metagenomics with deep learning. NAR Genomics and Bioinformatics. 2 (1), (2020).
  2. Ren, J., et al. VirFinder: a novel k -mer based tool for identifying viral sequences from assembled metagenomic data. Microbiome. 5 (1), 69 (2017).
  3. Fang, Z., et al. PPR-Meta: a tool for identifying phages and plasmids from metagenomic fragments using deep learning. GigaScience. 8 (6), (2019).
  4. Ren, J., et al. Identifying viruses from metagenomic data using deep learning. Quantitative Biology. 8 (1), 64-77 (2020).
  5. Zhou, F., Xu, Y. cBar: a computer program to distinguish plasmid-derived from chromosome-derived sequence fragments in metagenomics data. Bioinformatics. 26 (16), 2051-2052 (2010).
  6. Krawczyk, P. S., Lipinski, L., Dziembowski, A. PlasFlow: predicting plasmid sequences in metagenomic data using genome signatures. Nucleic Acids Research. 46 (6), (2018).
  7. Pellow, D., Mizrahi, I., Shamir, R. PlasClass improves plasmid sequence classification. PLOS Computational Biology. 16 (4), (2020).
  8. Arango-Argoty, G., et al. DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data. Microbiome. 6 (1), 1-15 (2018).
  9. Zheng, D., Pang, G., Liu, B., Chen, L., Yang, J. Learning transferable deep convolutional neural networks for the classification of bacterial virulence factors. Bioinformatics. 36 (12), 3693-3702 (2020).
  10. LeCun, Y., Bengio, Y., Hinton, G. Deep learning. Nature. 521 (7553), 436-444 (2015).
  11. Fang, Z., Zhou, H. VirionFinder: Identification of Complete and Partial Prokaryote Virus Virion Protein From Virome Data Using the Sequence and Biochemical Properties of Amino Acids. Frontiers in Microbiology. 12, 615711 (2021).
  12. Fang, Z., Zhou, H. Identification of the conjugative and mobilizable plasmid fragments in the plasmidome using sequence signatures. Microbial Genomics. 6 (11), (2020).
  13. Richter, D. C., Ott, F., Auch, A. F., Schmid, R., Huson, D. H. MetaSim-a sequencing simulator for genomics and metagenomics. PLoS One. 3 (10), 3373 (2008).
  14. Zhang, M., et al. Prediction of virus-host infectious association by supervised learning methods. BMC Bioinformatics. 18 (3), 143-154 (2017).

This article has been published

Video Coming Soon

JoVE Logo

Privacy

Terms of Use

Policies

Research

Education

ABOUT JoVE

Copyright © 2024 MyJoVE Corporation. All rights reserved