Generating the Transcriptional Regulation View of Transcriptomic Features for Prediction Task and Dark Biomarker Detection on Small Datasets

567 Views

•

03:37 min

•

March 1st, 2024

DOI :

10.3791/66030-v

March 1st, 2024

•

Kewei Li¹, Yusi Fan¹, Yaqing Liu¹, Hongmei Liu², Gongyou Zhang², Meiyu Duan¹, Lan Huang¹, Fengfeng Zhou¹

¹College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, ²School of Biology and Engineering, Guizhou Medical University

Transcript

Our research focuses on the intergenetic interactions of disease. We found that many genes have complex intertwined relationships with each other. By exploring these relationships, we aim in making more precise diagnosis in disease treatment and management.

We found that many dark biomarkers with traditional combinational methods ignored, are supported by many medical literatures. This indicates that approach can further assist biologists in reducing time cycle for biomarker screening. The biggest challenge during the experiment was addressing the issue of small sample sizes of different disease types.

To overcome this difficulty, we collected as many healthy samples as possible to train a reference model, which we called Health Model. To begin, create a new virtual environment named Health Model with Python version 3.7. in the Slurm Cluster supercomputing platform, execute the module load anaconda command.

Once the command is executed, a confirmation prompt appears on the screen. Enter Y to proceed and wait for the process to complete. Then activate the virtual environment following platform specific instructions.

Next, run the command to install PyTorch 1.13.1. Install additional packages for torch geometric, such as torch underscore scatter, torch underscore sparse, torch underscore cluster, and torch underscore spline underscore convulsion, following the installation guidelines. Then install the torch geometric package version 2.2.0.

Download the code and the pre-trained Health Model from the Health Informatics Lab website. Decompress the file to a desired path. Then change the working directory in the command line to the Health Model MQ trans folder.

Execute the command to generate MQ trans features and obtain the outputs. The MQ trans features will be generated as output MQ targets CSV, and the label file will be received as output label CSV. Additionally, original expression values of the MRNA genes will be extracted as file output test targets CSV.

Next, use the feature selection algorithm for selecting MQ trans features. If selecting MQ trans features or original features without combining them, set combine to false. Select 800 original features and split the dataset into 0.8 to 0.2 for training and testing.

To combine MQ trans features with the original expression values for feature selection, set combine to true. Dark biomarkers with differential MQ trans values, but undifferential mRNA expression were identified. Among 3, 062 features, 221 dark biomarkers were detected.

A general scarcity of dark biomarkers was observed in comparison to traditional biomarkers across most cancer types except BRCA, MESO, and TGCT.

Summary

Explore More Videos

Transcriptional Regulation

Transcriptomic Features

Multitask Graph attention Network

HealthModel