Name: Video: Data Preprocessing for DeepOmicsAE Workflow
Uploaded: 2023-12-15T14:00:00
Description: 67 Views. Data Preprocessing for DeepOmicsAE Workflow

data preprocessing for deepomicsae workflow

Analyzing Signaling Modules in Alzheimer's Disease with Deep Learning

An increasingly large proportion of the population is aging and the burden of age-related diseases, such as neurodegeneration, is expected to sharply increase in the coming decades1. Alzheimer&#39;s disease is the most common type of neurodegenerative disease2. Progress in finding a treatment has been slow given our poor understanding of the fundamental&#160;molecular mechanisms driving the onset and progress of the disease. The majority of information on Alzheimer&#39;s disease is gained postmortem from the examination of brain tissue, which has made distinguishing causes and consequences a difficult task3. The Religious Orders Study/Memory and Aging Project (ROSMAP) is an ambitious effort to gain a broader understanding of neurodegeneration, which involves the study of thousands of individuals who have committed to undergo medical and psychological examinations yearly and to contribute their brains for research after their demise4. The study focuses on the transition from the normal functioning of the brain to Alzheimer&#39;s disease2. Within the project, postmortem brain samples&#160;were analyzed with a plethora of omics approaches, including genomics, epigenomics, transcriptomics, proteomics5, and metabolomics.
Omics technologies that offer functional readouts of cellular states (i.e., proteomics and metabolomics)6,7 are key to interpreting disease8,9,10,11,12, due to the direct relationship between protein and metabolite abundance and cellular activities. Proteins are the primary executors of cellular processes, while metabolites are the substrates and products for biochemical reactions. Multi-omics data analysis offers the possibility to understand the complex relationships between proteomics and metabolomics data instead of appreciating them in isolation. Multi-omics is a discipline that studies multiple layers of high-dimensional biological data, including molecular data (genome sequence and mutations, transcriptome, proteome, metabolome), clinical imaging data, and clinical features. Particularly, multi-omics data analysis aims to integrate such layers of biological data, understand their reciprocal regulation and interaction dynamics, and deliver a holistic understanding of disease onset and progression. However, methods to integrate multi-omics data remain in the early stages of development13.
Autoencoders, a type of unsupervised neural network14, are a powerful tool for multi-omics data integration. Unlike supervised neural networks, autoencoders do not map samples to specific target values (such as healthy or diseased), nor are they used to predict outcomes. One of their primary applications&#160;lies in dimensionality reduction. However, autoencoders offer several advantages over simpler dimensionality reduction methods such as principal component analysis (PCA), t-distributed stochastic neighbor embedding (tSNE), or uniform manifold approximation and projection (UMAP). Unlike PCA, autoencoders can capture non-linear relationships within the data. Unlike tSNE and UMAP, they can detect hierarchical and multi-modal relationships within the data since they rely on multiple layers of computational units each containing non linear activation functions. Therefore, they represent attractive models to capture the complexity of multi-omics data. Finally, while the primary application of PCA, tSNE, and UMAP is that of clustering the data, autoencoders compress the input data into extracted features that are well-suited for downstream predictive tasks15,16.
Briefly, neural networks comprise several layers, each containing multiple computational units or &#34;neurons.&#34; The first and last layers are referred to as the input and output layers, respectively. Autoencoders are neural networks with an hourglass structure, consisting of an input layer, followed by one to three hidden layers and a small &#34;latent&#34; layer typically containing between two and six neurons. This structure&#39;s first half is known as the encoder and is combined with a decoder mirroring the encoder. The decoder ends with an output layer containing the same number of neurons as the input layer. Autoencoders take the input through the bottleneck and reconstruct it in the output layer, with the goal of generating an output that mirrors the original information as closely as possible. This is achieved by mathematically minimizing a parameter termed &#34;reconstruction loss.&#34; The input consists of a set of features, which in the application showcased herein will be protein and metabolite abundances, and clinical characteristics (i.e., sex, education, and age at death). The latent layer contains a compressed and information-rich representation of the input, which can be used for subsequent applications such as predictive models17,18.
This protocol presents a workflow, DeepOmicsAE, which involves: 1) preprocessing of proteomics, metabolomics, and clinical data (i.e., normalization, scaling, outlier removal) to obtain data with a consistent scale for machine learning analysis; 2) selecting appropriate autoencoder input features, since feature overload may obscure relevant disease patterns; 3) optimizing and training the autoencoder, including determining the optimal number of proteins and metabolites to select, and of neurons for the latent layer; 4) extracting features from the latent layer; and 5) utilizing the extracted features for biological interpretation by identifying molecular signaling modules and their relationship with clinical features.
This protocol aims to be simple and applicable by&#160;biologists with limited computational experience who have a basic understanding of programming with Python. The protocol focuses on analyzing multi-omics data, including proteomics, metabolomics, and clinical features, but its use can be extended to other types of molecular expression data, including transcriptomics. One important novel application introduced by this protocol is mapping the importance scores of original features onto individual neurons in the latent layer. As a result, each neuron in the latent layer represents a signaling module, detailing the interactions between specific molecular alterations and the patients&#39; clinical characteristics. Biological interpretation of the molecular signaling modules is obtained by using MetaboAnalyst, a publicly available tool that integrates gene/protein and metabolite data to derive enriched metabolic and cell signaling pathways17.

Author Spotlight: Advancing Alzheimer's Research &#8211; Exploring Early Detection and Multi-Omics Approaches

DeepOmicsAE: Representing Signaling Modules in Alzheimer's Disease with Deep Learning Analysis of Proteomics, Metabolomics, and Clinical Data

Data Preprocessing for DeepOmicsAE Workflow

data-preprocessing-for-deepomicsae-workflow

Research

JoVE Journal

Biology

67 Views. Data Preprocessing for DeepOmicsAE Workflow

Video: Data Preprocessing for DeepOmicsAE Workflow

Large omics datasets are becoming increasingly available for research into human health. This paper presents DeepOmicsAE, a workflow optimized for the analysis of multi-omics datasets, including&#160;proteomics, metabolomics, and clinical data. This workflow employs a type of neural network called autoencoder, to extract a concise set of features from the high-dimensional multi-omics input data. Furthermore, the workflow provides a method to optimize the key parameters needed to implement the autoencoder. To showcase this workflow, clinical data were analyzed from a cohort of 142 individuals who were either healthy or diagnosed with Alzheimer&#39;s disease, along with the proteome and metabolome of their postmortem brain samples. The features extracted from the latent layer of the autoencoder retain the biological information that separates healthy and diseased patients. In addition, the individual extracted features represent distinct molecular signaling modules, each of which interacts uniquely with the individuals&#39; clinical features,&#160;providing for a mean to integrate&#160;the proteomics, metabolomics, and clinical data.

DeepOmicsAE is a workflow centered on the application of a deep learning method (i.e., an autoencoder) to reduce the dimensionality of multi-omics data, providing a foundation for predictive models and signaling modules representing multiple layers of omics data.

Large omics datasets are becoming increasingly available for research into human health. This paper presents DeepOmicsAE, a workflow optimized for the ...