Sign In

A subscription to JoVE is required to view this content. Sign in or start your free trial.

In This Article

  • Summary
  • Abstract
  • Introduction
  • Protocol
  • Representative Results
  • Discussion
  • Acknowledgements
  • Materials
  • References
  • Reprints and Permissions

Summary

Here, we introduce a protocol for converting transcriptomic data into a mqTrans view, enabling the identification of dark biomarkers. While not differentially expressed in conventional transcriptomic analyses, these biomarkers exhibit differential expression in the mqTrans view. The approach serves as a complementary technique to traditional methods, unveiling previously overlooked biomarkers.

Abstract

Transcriptome represents the expression levels of many genes in a sample and has been widely used in biological research and clinical practice. Researchers usually focused on transcriptomic biomarkers with differential representations between a phenotype group and a control group of samples. This study presented a multitask graph-attention network (GAT) learning framework to learn the complex inter-genic interactions of the reference samples. A demonstrative reference model was pre-trained on the healthy samples (HealthModel), which could be directly used to generate the model-based quantitative transcriptional regulation (mqTrans) view of the independent test transcriptomes. The generated mqTrans view of transcriptomes was demonstrated by prediction tasks and dark biomarker detection. The coined term "dark biomarker" stemmed from its definition that a dark biomarker showed differential representation in the mqTrans view but no differential expression in its original expression level. A dark biomarker was always overlooked in traditional biomarker detection studies due to the absence of differential expression. The source code and the manual of the pipeline HealthModelPipe can be downloaded from http://www.healthinformaticslab.org/supp/resources.php.

Introduction

Transcriptome consists of the expressions of all the genes in a sample and may be profiled by high-throughput technologies like microarray and RNA-seq1. The expression levels of one gene in a dataset are called a transcriptomic feature, and the differential representation of a transcriptomic feature between the phenotype and control groups defines this gene as a biomarker of this phenotype2,3. Transcriptomic biomarkers have been extensively utilized in the investigations of disease diagnosis4, biological mechanism5, a....

Protocol

NOTE: The following protocol describes the details of the informatics analytic procedure and Python commands of the major modules. Figure 2 illustrates the three major steps with example commands utilized in this protocol and refer to previously published works26,38 for more technical details. Do the following protocol under a normal user account in a computer system and avoid using the administrator or root account. This is a comput.......

Representative Results

Evaluation of the mqTrans view of the transcriptomic dataset
The test code uses eleven feature selection (FS) algorithms and seven classifiers to evaluate how the generated mqTrans view of the transcriptomic dataset contributes to the classification task (Figure 6). The test dataset consists of 317 colon adenocarcinoma (COAD) from The Cancer Genome Atlas (TCGA) database29. The COAD patients at stages I or II are regarded as the negative samples,.......

Discussion

Section 2 (Use the pre-trained HealthModel to generate the mqTrans features) of the protocol is the most critical step within this protocol. After preparing the computational working environment in section 1, section 2 generates the mqTrans view of a transcriptomic dataset based on the pre-trained large reference model. Section 3 is a demonstrative example of selecting the generated mqTrans features for biomarker detections and prediction tasks. The users can conduct other transcriptomic analyses on this mqTrans dataset .......

Acknowledgements

This work was supported by the Senior and Junior Technological Innovation Team (20210509055RQ), Guizhou Provincial Science and Technology Projects (ZK2023-297), the Science and Technology Foundation of Health Commission of Guizhou Province (gzwkj2023-565), Science and Technology Project of Education Department of Jilin Province (JJKH20220245KJ and JJKH20220226SK), the National Natural Science Foundation of China (U19A2061), the Jilin Provincial Key Laboratory of Big Data Intelligent Computing (20180622002JC), and the Fundamental Research Funds for the Central Universities, JLU. We extend our sincerest appreciation to the review editor and the three anonymous reviewers....

Materials

NameCompanyCatalog NumberComments
AnacondaAnacondaversion 2020.11Python programming platform
ComputerN/AN/AAny general-purpose computers satisfy the requirement
GPU cardN/AN/AAny general-purpose GPU cards with the CUDA computing library
pytorchPytorchversion 1.13.1Software
torch-geometricPytorchversion 2.2.0Software

References

  1. Mutz, K. -. O., Heilkenbrinker, A., Lönne, M., Walter, J. -. G., Stahl, F. Transcriptome analysis using next-generation sequencing. Curr Opin in Biotechnol. 24 (1), 22-30 (2013).
  2. Meng, G., Tang, W., Huang, E., Li, Z., Feng, H.

Explore More Articles

Transcriptional RegulationTranscriptomic FeaturesPrediction TaskDark BiomarkerSmall DatasetsGene InteractionsDisease DiagnosisMqTrans ViewGraph Attention NetworkHealthModelBiomarker Detection

This article has been published

Video Coming Soon

JoVE Logo

Privacy

Terms of Use

Policies

Research

Education

ABOUT JoVE

Copyright © 2024 MyJoVE Corporation. All rights reserved