Se requiere una suscripción a JoVE para ver este contenido. Inicie sesión o comience su prueba gratuita.
We present a flexible, extendible Jupyter-lab-based workflow for the unsupervised analysis of complex multi-omics datasets that combines different pre-processing steps, estimation of the multi-omics factor analysis model, and several downstream analyses.
Disease mechanisms are usually complex and governed by the interaction of several distinct molecular processes. Complex, multidimensional datasets are a valuable resource to generate more insights into those processes, but the analysis of such datasets can be challenging due to the high dimensionality resulting, for example, from different disease conditions, timepoints, and omics capturing the process at different resolutions.
Here, we showcase an approach to analyze and explore such a complex multiomics dataset in an unsupervised way by applying multi-omics factor analysis (MOFA) to a dataset generated from blood samples that capture the immune response in acute and chronic coronary syndromes. The dataset consists of several assays at differing resolutions, including sample-level cytokine data, plasma-proteomics and neutrophil prime-seq, and single-cell RNA-seq (scRNA-seq) data. Further complexity is added by having several different time points measured per patient and several patient subgroups.
The analysis workflow outlines how to integrate and analyze the data in several steps: (1) Data pre-processing and harmonization, (2) Estimation of the MOFA model, (3) Downstream analysis. Step 1 outlines how to process the features of the different data types, filter out low-quality features, and normalize them to harmonize their distributions for further analysis. Step 2 shows how to apply the MOFA model and explore the major sources of variance within the dataset across all omics and features. Step 3 presents several strategies for the downstream analysis of the captured patterns, linking them to the disease conditions and potential molecular processes governing those conditions.
Overall, we present a workflow for unsupervised data exploration of complex multi-omics datasets to enable the identification of major axes of variation composed of differing molecular features that can also be applied to other contexts and multi-omics datasets (including other assays as presented in the exemplary use case).
Disease mechanisms are usually complex and governed by the interaction of several distinct molecular processes. Deciphering the complex molecular mechanisms that lead to specific diseases or govern the evolution of a disease is a task with high medical relevance as it might reveal new insights for understanding and treating diseases.
Recent technological advances enable to measure those processes on a higher resolution (e.g., on the single-cell level) and at various biological layers (e.g., DNA, mRNA, chromatin accessibility, DNA methylation, proteomics) at the same time. This leads to the increasing generation of large multidimensional bio....
1. Preparations: Technical setup and installation
NOTE: To run this program, have wget, git, and Apptainer preinstalled on the device. A guide for installing Apptainer on different systems (Linux, Windows, Mac) is given here: https://apptainer.org/docs/admin/main/installation.html. Installation information on git can be found here: https://git-scm.com/book/en/v2/Getting-Started-Installing-Git. Depending on the size of the different input datasets, running the workflow on a suita.......
Following the successful execution of the workflow, several tables and figures are generated as indicated by Figure 2. Figures are placed in the /figures folder (Figure 6, Figure 7, Figure 8, Supplementary Figure 1, Supplementary Figure 2, Supplementary Figure 3, Supplementary Figure 4), and tables will be placed in the.......
With the outlined protocol, a modular and extendible Jupyter-notebook-based workflow that can be used to quickly explore a complex multi-omics dataset is presented. The major parts of the workflow consist of the pre-processing and data harmonization part (offering different standard steps for filtering and normalization of the data), estimation of the MOFA9 model and some exemplary downstream analysis. One of the main critical steps is to pre-process and integrate, and harmonize the different omic.......
C.L. is supported by the Helmholtz Association under the joint research school "Munich School for Data Science - MUDS".
....Name | Company | Catalog Number | Comments |
Apptainer | NA | NA | https://apptainer.org/docs/admin/main/installation.html |
Compute server or workstation or cloud (Linux, Mac or Windows environment). Depending on the size of the different input datasets we recommend running the workflow on a suitable machine (in our setting we use: 16 CPU, 64GB Memory) | Any manufacturer | 16 CPU, 64GB Memory | Large Memory is only required for the processing of the raw single cell data. After preprocessing the later analysis steps can also be performed on regular desktop or laptop computers |
git | NA | NA | https://git-scm.com/book/en/v2/Getting-Started-Installing-Git |
GitHub | GitHub | NA | https://github.com/heiniglab/mofa_workflow |
Explore More Articles
This article has been published
Video Coming Soon
ACERCA DE JoVE
Copyright © 2025 MyJoVE Corporation. Todos los derechos reservados