Sign In

A subscription to JoVE is required to view this content. Sign in or start your free trial.

In This Article

  • Summary
  • Abstract
  • Introduction
  • Protocol
  • Representative Results
  • Discussion
  • Disclosures
  • Acknowledgements
  • Materials
  • References
  • Reprints and Permissions

Summary

This tutorial describes a simple method to construct a deep learning algorithm for performing 2-class sequence classification of metagenomic data.

Abstract

A variety of biological sequence classification tasks, such as species classification, gene function classification and viral host classification, are expected processes in many metagenomic data analyses. Since metagenomic data contain a large number of novel species and genes, high-performing classification algorithms are needed in many studies. Biologists often encounter challenges in finding suitable sequence classification and annotation tools for a specific task and are often not able to construct a corresponding algorithm on their own because of a lack of the necessary mathematical and computational knowledge. Deep learning techniques have recently become a popular topic and show strong advantages in many classification tasks. To date, many highly packaged deep learning packages, which make it possible for biologists to construct deep learning frameworks according to their own needs without in-depth knowledge of the algorithm details, have been developed. In this tutorial, we provide a guideline for constructing an easy-to-use deep learning framework for sequence classification without the need for sufficient mathematical knowledge or programming skills. All the code is optimized in a virtual machine so that users can directly run the code using their own data.

Introduction

The metagenomic sequencing technique bypasses the strain isolation process and directly sequences the total DNA in an environmental sample. Thus, metagenomic data contain DNA from different organisms, and most biological sequences are from novel organisms that are not present in the current database. According to different research purposes, biologists need to classify these sequences from different perspectives, such as taxonomic classification1, virus-bacteria classification2,3,4, chromosome-plasmid classification3,....

Protocol

1. The installation of the virtual machine

  1. Download the virtual machine file from (https://github.com/zhenchengfang/DL-VM).
  2. Download the VirtualBox software from https://www.virtualbox.org.
  3. Decompress the ".7z" file using related software, such as "7-Zip", "WinRAR" or "WinZip".
  4. Install the VirtualBox software by clicking the Next button in each step.
  5. Open the VirtualBox software and click the New button to create a virtual machine.
  6. Step 6: Enter the specified virtual machine name in the "Name" frame, select Linux

Representative Results

In our previous work, we developed a series of sequence classification tools for metagenomic data using an approach similar to this tutorial3,11,12. As an example, we deposited the sequence files of the subset of training set and test set from our previous work3,11 in the virtual machine.

Fang & Zhou11 aimed to iden.......

Discussion

This tutorial provides an overview for biologists and algorithm design beginners on how to construct an easy-to-use deep learning framework for biological sequence classification in metagenomic data. This tutorial aims to provide intuitive understanding of deep learning and address the challenge that beginners often have difficulty installing the deep learning package and writing the code for the algorithm. For some simple classification tasks, users can use the framework to perform the classification tasks.

Disclosures

The authors declare that there are no conflicts of interest.

Acknowledgements

This investigation was financially supported by the National Natural Science Foundation of China (81925026, 82002201, 81800746, 82102508).

....

Materials

NameCompanyCatalog NumberComments
PC or serverNANASuggested memory: >6GB
VirtualBox softwareNANALink: https://www.virtualbox.org

References

  1. Liang, Q., Bible, P. W., Liu, Y., Zou, B., Wei, L. DeepMicrobes: taxonomic classification for metagenomics with deep learning. NAR Genomics and Bioinformatics. 2 (1), (2020).
  2. Ren, J., et al.

Reprints and Permissions

Request permission to reuse the text or figures of this JoVE article

Request Permission

Explore More Articles

Virtual MachineDeep LearningBiological Sequence ClassificationMetagenomic DataSpecies ClassificationGene Function ClassificationHost ClassificationDeep Learning FrameworkSequence ClassificationVirtualBoxUbuntu

This article has been published

Video Coming Soon

JoVE Logo

Privacy

Terms of Use

Policies

Research

Education

ABOUT JoVE

Copyright © 2025 MyJoVE Corporation. All rights reserved