Sign In

A subscription to JoVE is required to view this content. Sign in or start your free trial.

In This Article

  • Summary
  • Abstract
  • Introduction
  • Protocol
  • Representative Results
  • Discussion
  • Acknowledgements
  • Materials
  • References
  • Reprints and Permissions

Summary

DeepOmicsAE is a workflow centered on the application of a deep learning method (i.e., an autoencoder) to reduce the dimensionality of multi-omics data, providing a foundation for predictive models and signaling modules representing multiple layers of omics data.

Abstract

Large omics datasets are becoming increasingly available for research into human health. This paper presents DeepOmicsAE, a workflow optimized for the analysis of multi-omics datasets, including proteomics, metabolomics, and clinical data. This workflow employs a type of neural network called autoencoder, to extract a concise set of features from the high-dimensional multi-omics input data. Furthermore, the workflow provides a method to optimize the key parameters needed to implement the autoencoder. To showcase this workflow, clinical data were analyzed from a cohort of 142 individuals who were either healthy or diagnosed with Alzheimer's disease, along with the proteome and metabolome of their postmortem brain samples. The features extracted from the latent layer of the autoencoder retain the biological information that separates healthy and diseased patients. In addition, the individual extracted features represent distinct molecular signaling modules, each of which interacts uniquely with the individuals' clinical features, providing for a mean to integrate the proteomics, metabolomics, and clinical data.

Introduction

An increasingly large proportion of the population is aging and the burden of age-related diseases, such as neurodegeneration, is expected to sharply increase in the coming decades1. Alzheimer's disease is the most common type of neurodegenerative disease2. Progress in finding a treatment has been slow given our poor understanding of the fundamental molecular mechanisms driving the onset and progress of the disease. The majority of information on Alzheimer's disease is gained postmortem from the examination of brain tissue, which has made distinguishing causes and consequences a difficult task

Protocol

NOTE: The data used here were ROSMAP data downloaded from the AD Knowledge portal. Informed consent is not needed to download and reuse the data. The protocol presented herein utilizes deep learning to analyze multi-omics data and identify signaling modules that distinguish specific patient or sample groups based, for example, on their diagnosis. The protocol also delivers a small set of extracted features that summarize the original large-scale data and can be used for further analysis such as training a predictive mode.......

Representative Results

To showcase the protocol, we analyzed a dataset comprising the proteome, metabolome, and clinical information derived from postmortem brains of 142 individuals who were either healthy or diagnosed with Alzheimer's disease.

After performing the protocol section 1 to preprocess the data, the dataset included 6,497 proteins, 443 metabolites, and three clinical features (sex, age at death, and education). The target feature is clinical consensus diagnosis of cognitive status at ti.......

Discussion

The structure of the dataset is critical to the success of the protocol and should be carefully checked. The data should be formatted as indicated in protocol section 1. The correct assignment of column positions is also critical to the success of the method. Proteomics and metabolomics data are preprocessed differently and feature selection is conducted separately due to the different nature of the data. Therefore, it is critical to assign column positions correctly in protocol steps 1.5, 2.3, and 3.3.

Acknowledgements

This work was supported by NIH grant CA201402 and the Cornell Center for Vertebrate Genomics (CVG) Distinguished Scholar Award. The results published here are in whole or in part based on data obtained from the AD Knowledge Portal (https://adknowledgeportal.org). Study data were provided through the Accelerating Medicine Partnership for AD (U01AG046161 and U01AG061357) based on samples provided by the Rush Alzheimer’s Disease Center, Rush University Medical Center, Chicago. Data collection was supported through funding by NIA grants P30AG10161, R01AG15819, R01AG17917, R01AG30146, R01AG36836, U01AG32984, U01AG46152, the Illinois Department of Public Health, and t....

Materials

NameCompanyCatalog NumberComments
ComputerAppleMac StudioApple M1 Ultra with 20-core CPU, 48-core GPU, 32-core Neural Engine; 64 GB unified memory
Conda v23.3.1Anaconda, Inc.N/Apackage management system and environment manager
conda environment
DeepOmicsAE
N/ADeepOmicsAE_env.ymlcontains packages necessary to run the worflow
github repository DeepOmicsAEMicrosofthttps://github.com/elepan84/DeepOmicsAE/provides scripts, Jupyter notebooks, and the conda environment file
Jupyter notebook v6.5.4Project JupyterN/Aa platform for interactive data science and scientific computing
DT01-metabolomics dataN/AROSMAP_Metabolon_HD4_Brain
514_assay_data.csv
This data was used to generate the Results reported in the article. Specifically, DT01-DT04 were merged by matching them based on the individualID. The column final consensus diagnosis (cogdx) was filtered to keep only patients classified as healthy or AD. Climnical features were filtered to keep the following: age at death, sex and education. Finally, age reported as 90+ was set to 91, then the age column was transformed to float64.
The data is available at https://adknowledgeportal.synapse.org
DT02-TMT proteomics dataN/AC2.median_polish_corrected_log2
(abundanceRatioCenteredOn
MedianOfBatchMediansPer
Protein)-8817x400.csv
DT03-clinical dataN/AROSMAP_clinical.csv
DT04-biospecimen metadataN/AROSMAP_biospecimen_metadata
.csv
Python 3.11.3 Python Software FoundationN/Aprogramming language

References

  1. Hou, Y., et al. Ageing as a risk factor for neurodegenerative disease. Nature Reviews Neurology. 15 (10), 565-581 (2019).
  2. Scheltens, P., et al. Alzheimer’s disease. The Lancet. 397 (10284), 1577-1590 (2021).
  3. Brei....

Explore More Articles

Alzheimer s DiseaseMulti omics DataDeep LearningAutoencoderProteomicsMetabolomicsClinical DataFeature ImportanceMolecular PathwaysRisk FactorsPersonalized Medicine

This article has been published

Video Coming Soon

JoVE Logo

Privacy

Terms of Use

Policies

Research

Education

ABOUT JoVE

Copyright © 2024 MyJoVE Corporation. All rights reserved