A subscription to JoVE is required to view this content. Sign in or start your free trial.
DeepOmicsAE is a workflow centered on the application of a deep learning method (i.e., an autoencoder) to reduce the dimensionality of multi-omics data, providing a foundation for predictive models and signaling modules representing multiple layers of omics data.
Large omics datasets are becoming increasingly available for research into human health. This paper presents DeepOmicsAE, a workflow optimized for the analysis of multi-omics datasets, including proteomics, metabolomics, and clinical data. This workflow employs a type of neural network called autoencoder, to extract a concise set of features from the high-dimensional multi-omics input data. Furthermore, the workflow provides a method to optimize the key parameters needed to implement the autoencoder. To showcase this workflow, clinical data were analyzed from a cohort of 142 individuals who were either healthy or diagnosed with Alzheimer's disease, along with the proteome and metabolome of their postmortem brain samples. The features extracted from the latent layer of the autoencoder retain the biological information that separates healthy and diseased patients. In addition, the individual extracted features represent distinct molecular signaling modules, each of which interacts uniquely with the individuals' clinical features, providing for a mean to integrate the proteomics, metabolomics, and clinical data.
An increasingly large proportion of the population is aging and the burden of age-related diseases, such as neurodegeneration, is expected to sharply increase in the coming decades1. Alzheimer's disease is the most common type of neurodegenerative disease2. Progress in finding a treatment has been slow given our poor understanding of the fundamental molecular mechanisms driving the onset and progress of the disease. The majority of information on Alzheimer's disease is gained postmortem from the examination of brain tissue, which has made distinguishing causes and consequences a difficult task
NOTE: The data used here were ROSMAP data downloaded from the AD Knowledge portal. Informed consent is not needed to download and reuse the data. The protocol presented herein utilizes deep learning to analyze multi-omics data and identify signaling modules that distinguish specific patient or sample groups based, for example, on their diagnosis. The protocol also delivers a small set of extracted features that summarize the original large-scale data and can be used for further analysis such as training a predictive mode.......
To showcase the protocol, we analyzed a dataset comprising the proteome, metabolome, and clinical information derived from postmortem brains of 142 individuals who were either healthy or diagnosed with Alzheimer's disease.
After performing the protocol section 1 to preprocess the data, the dataset included 6,497 proteins, 443 metabolites, and three clinical features (sex, age at death, and education). The target feature is clinical consensus diagnosis of cognitive status at ti.......
The structure of the dataset is critical to the success of the protocol and should be carefully checked. The data should be formatted as indicated in protocol section 1. The correct assignment of column positions is also critical to the success of the method. Proteomics and metabolomics data are preprocessed differently and feature selection is conducted separately due to the different nature of the data. Therefore, it is critical to assign column positions correctly in protocol steps 1.5, 2.3, and 3.3.
This work was supported by NIH grant CA201402 and the Cornell Center for Vertebrate Genomics (CVG) Distinguished Scholar Award. The results published here are in whole or in part based on data obtained from the AD Knowledge Portal (https://adknowledgeportal.org). Study data were provided through the Accelerating Medicine Partnership for AD (U01AG046161 and U01AG061357) based on samples provided by the Rush Alzheimer’s Disease Center, Rush University Medical Center, Chicago. Data collection was supported through funding by NIA grants P30AG10161, R01AG15819, R01AG17917, R01AG30146, R01AG36836, U01AG32984, U01AG46152, the Illinois Department of Public Health, and t....
Name | Company | Catalog Number | Comments |
Computer | Apple | Mac Studio | Apple M1 Ultra with 20-core CPU, 48-core GPU, 32-core Neural Engine; 64 GB unified memory |
Conda v23.3.1 | Anaconda, Inc. | N/A | package management system and environment manager |
conda environment DeepOmicsAE | N/A | DeepOmicsAE_env.yml | contains packages necessary to run the worflow |
github repository DeepOmicsAE | Microsoft | https://github.com/elepan84/DeepOmicsAE/ | provides scripts, Jupyter notebooks, and the conda environment file |
Jupyter notebook v6.5.4 | Project Jupyter | N/A | a platform for interactive data science and scientific computing |
DT01-metabolomics data | N/A | ROSMAP_Metabolon_HD4_Brain 514_assay_data.csv | This data was used to generate the Results reported in the article. Specifically, DT01-DT04 were merged by matching them based on the individualID. The column final consensus diagnosis (cogdx) was filtered to keep only patients classified as healthy or AD. Climnical features were filtered to keep the following: age at death, sex and education. Finally, age reported as 90+ was set to 91, then the age column was transformed to float64. The data is available at https://adknowledgeportal.synapse.org |
DT02-TMT proteomics data | N/A | C2.median_polish_corrected_log2 (abundanceRatioCenteredOn MedianOfBatchMediansPer Protein)-8817x400.csv | |
DT03-clinical data | N/A | ROSMAP_clinical.csv | |
DT04-biospecimen metadata | N/A | ROSMAP_biospecimen_metadata .csv | |
Python 3.11.3 | Python Software Foundation | N/A | programming language |
This article has been published
Video Coming Soon
ABOUT JoVE
Copyright © 2024 MyJoVE Corporation. All rights reserved