To begin, in the Jupyter notebook homepage, click on the M02-DeepOmicsAE model optimization. ipynb notebook to open it in a new tab. In the second cell of the notebook, type the name of the output file generated upon data pre-processing in place of M01_output_data.csv.
In the fifth cell, specify the column positions for different data types, such as proteomics data, metabolomics data, clinical data, and all molecular expression data. Replace col_start and col_end with the appropriate column indices for each data type. Specify the name of the column containing the target variable in place of y_column_name as y_label.
In the sixth cell, define the number of rounds for model optimization by assigning a value to n_comb. More optimization rounds will help fine-tune the model parameters and improve model performance, but we'll also increase processing time. Execute the notebook by selecting Cell, then Run All from the menu bar.
To implement the workflow, click on the M03a-DeepOmicsAE implementation with custom optimized parameters. ipynb notebook on the Jupyter notebook homepage. In the second cell of the notebook, type the name of the output file generated upon data pre-processing in place of M01_output_data.csv.
In the fifth cell, specify the column positions for different data types, such as proteomics data, metabolomics data, clinical data, and all molecular expression data. Replace col_start and col_end with the appropriate column indices for each data type. Specify the name of the column containing the target variable in place of y_column_name as y_label.
Select Cell followed by Run All from the menu bar. The PCA plots and distribution of important feature scores will be automatically saved in the local folder. Lists of important features for each identified signaling module will also be stored as text files in the local folder with the names module_n.txt.
To implement the workflow with preset parameters, click on the M03b-DeepOmicsAE implementation with pre-set parameters. ipynb notebook on the Jupyter notebook homepage. Then follow the same procedure.
Note that the parameters kprot, kmet, and latent in the seventh cell of the notebooks are computed automatically within the script based on the results of previous optimization rounds. Proteome, metabolome, and clinical data from 142 postmortem human brain samples derived from individuals that were either healthy or diagnosed with Alzheimer's disease were analyzed using this workflow based on a deep-learning auto-encoder model for extracting a concise set of features from the high-dimensional multi-omics input data. The results from model parameter optimization show that selecting a small number of proteomic and metabolomic's features to be used as input for the model provides for a higher degree of separation between the healthy and Alzheimer's disease patients.
Whereas the number of neurons in the latent layer did not have a major impact on the performance of the model. Using the optimal parameters, a small set of features summarizing the input data, called extracted features, were extracted from the auto-encoder model's latent layer. PCA analysis showed that the diagnostic groups were separated by the extracted features.
However, the groups were not distinguished well by the original features, indicating that the extracted features capture the information crucial for determining the disease state.