DeepOmicsAE: Representing Signaling Modules in Alzheimer's Disease with Deep Learning Analysis of Proteomics, Metabolomics, and Clinical Data

874 Views

•

09:47 min

•

December 15th, 2023

DOI :

10.3791/65910-v

December 15th, 2023

•

Elena Panizza¹

¹Department of Molecular Medicine, Cornell University

Transcript

Alzheimer's disease is thought to initiate decades before symptoms emerge. Recent studies suggest that phenotypic traits, such as obesity and hypertension, but also the level of education and social engagement can act as risk factors. Our goal is to become able to decipher their contributions and their relation to molecular drivers of disease to learn how to intervene early and in a personalized manner.

Multiomic data analysis can be used for integrating various layers of biological data, such as proteomics, transcriptomics, metabolomics, and phenotypic traits to comprehensively understand a disease state. Auto uncover models use deep learning___ to reduce the dimensionality of multiomics datasets, effectively summarizing the crucial information. However, it is challenging to interpret how important are individual features in the original data with respect to the summarized output.

In Deep-omics AE, we built in an algorithm that derives the importance of individual multiomics features relative to the learn/reduced dimensionality representation. With this approach, we can identify molecular similar modules and their association with patients'phenotypic traits. Deep-omics AE helps putting in relation patient's phenotypic traits with the molecular makeup of disease.

For example, you can use it to ask, What are the molecular pathways that are most implicated in Alzheimer's disease in older patients, and what are those that are most implicated in younger patients? What are those implicated in developing the disease in less educated patients versus more educated patients? To begin, initiate a new Jupyter Notebook session by opening a new terminal window and typing Jupyter Notebook, then press enter.

On the Jupyter Notebook homepage, select the notebook titled M01 expression data pre-processing. ipynb to open it in a new browser tab. This notebook will normalize and scale the input data, handle missing data, and remove outliers.

In the second cell of the notebook, replace the placeholder, your_dataset_name. csv, with the actual name of the dataset file. In the last cell of the notebook, replace M01_output_data.

csv, with the preferred name for the output data file. For each data type such as proteomics, metabolomics, continuous clinical data, and binary clinical data, use the command in the fourth cell to determine the indices corresponding to the first and last columns. Check the column names to locate the columns corresponding to the proteomics data, metabolomics data, and clinical data.

Specify the column positions for different data types in the fifth cell by replacing col_start and col_end with the first and last column indices for each data type. Select cell, then run all from the menu bar and Jupyter to create the output data file in the specified folder. To begin, in the Jupyter Notebook homepage, click on the M02 Deep-omics AE model optimization.

ipynb notebook to open it in a new tab. In the second cell of the notebook, type the name of the output file generated upon data pre-processing in place of M01_output_data.csv. In the fifth cell, specify the column positions for different data types, such as proteomics data, metabolomics data, clinical data, and all-molecular expression data.

Replace col_start and col_end with the appropriate column indices for each data type. Specify the name of the column containing the target variable in place of y_column_name as y_label. In the sixth cell, define the number of rounds for model optimization by assigning a value to n_comb.

More optimization rounds will help fine-tune the model parameters and improve model performance, but will also increase processing time. Execute the notebook by selecting cell, then run all from the menu bar. To implement the workflow, click on the M03a DeepOmicsAE implementation with custom optimized parameters.

ipynb notebook on the Jupyter Notebook homepage. In the second cell of the notebook, type the name of the output file generated upon data pre-processing in place of M01_output_data.csv. In the fifth cell, specify the column positions for different data types, such as proteomics data, metabolomics data, clinical data, and all-molecular expression data.

Replace col_start and col_end with the appropriate column indices for each data type. Specify the name of the column containing the target variable in place of y_column_name as y_label. Select cell, followed by run all from the menu bar.

The PCA plots and distribution of important feature scores will be automatically saved in the local folder. Lists of important features for each identified signaling module will also be stored as text files in the local folder with the names module_n.txt. To implement the workflow with preset parameters, click on the M013b DeepOmicsAE implementation with preset parameters.

ipynb notebook on the Jupyter Notebook homepage, then follow the same procedure. Note that the parameters K prot, K met, and latent in the seventh cell of the notebooks are computed automatically within the script based on the results of previous optimization rounds. Proteome, metabolome, and clinical data from 142 postmortem human brain samples derived from individuals that were either healthy or diagnosed with Alzheimer's disease were analyzed using this workflow based on a deep learning auto encoder model for extracting a concise set of features from the high-dimensional multiomics input data.

The results from model parameter optimization showed that selecting a small number of proteomic and metabolomics features to be used as input for the model provides for a higher degree of separation between the healthy and Alzheimer's disease patients. Whereas the number of neurons in the latent layer did not have a major impact on the performance of the model. Using the optimal parameters, a small set of features summarizing the input data called extracted features were extracted from the auto encoder model's latent layer.

PCA analysis showed that the diagnostic groups were separated by the extracted features. However, the groups were not distinguished well by the original features, indicating that the extracted features capture the information crucial for determining the disease state. To begin, open a web browser and navigate to the link to access the joint pathway analysis functionality on the MetaboAnalyst website.

Access the folder where the output files from implementing Deep-omics AE are saved, and open the text files named module_n. txt for each signaling module generated. From the text files, locate the list of proteins, and copy them.

On the MetaboAnalyst webpage, paste the copied list of proteins into the genes proteins with optional fold changes window. Repeat for metabolites, pasting them into the compound list with optional fold changes window on the same webpage. Choose the relevant organism and the ID type, then submit the information by clicking submit at the bottom of the page.

On the following page, verify the ID mapping to ensure that the identifiers are correctly recognized, and then click proceed. In the parameter setting page, select metabolic pathways integrated or all pathways integrated to visualize the contribution of input to metabolic or all-signaling pathways. In the algorithm selection panel, choose enrichment analysis, hyper geometric test, topology measure, degree centrality, and integration method, combine P-values pathway level.

Click on submit at the bottom of the page. The final page, result view will display the outcomes of the enrichment analysis. This includes plots of enriched pathways based on their impact, and significance, and a tabular list of pathways.

MeboAnalyst interpretation of the signaling modules obtained upon Deep-omics AE on a dataset from brain samples of healthy or Alzheimer's patients revealed that for each signaling module, distinct metabolic and signaling pathways were enriched. Moreover, the characterization of interactions occurring between clinical features and signaling modules showed that the altered glycerolipid metabolism in the Alzheimer's patients was associated with their sex and age at death. Conversely, alterations of synapses and axon functionality tend to occur across Alzheimer's patients irrespective of their sex, education level, and longevity.

Summary

DeepOmicsAE is a workflow centered on the application of a deep learning method (i.e., an autoencoder) to reduce the dimensionality of multi-omics data, providing a foundation for predictive models and signaling modules representing multiple layers of omics data.

Explore More Videos

DeepOmicsAE

Alzheimer s Disease

Deep Learning

Multi omic Data Analysis

Data Dimensionality Reduction

Molecular Pathways

Patient Phenotypes