Our workflow makes it possible to easily analyze complex multi-omics datasets of different resolutions. The approach extracts a major patterns of variants that are either unique or shared to the specific data types and aggregates them. The resulting so-called factors can then be linked to molecular processes and clinical or technical covariates.
Our analysis was among the first to apply the MOFA model to multi-omics and single cell data of multiple samples. Importantly, these samples were from a clinical cohort of heart attack patients. This allowed us to identify multicellular immune signatures that are associated with outcome and disease state.
So the availability of single cell and multi-omic datasets increases, but often the features of those datasets are only analyzed separately. This limits insights because, usually, biological processes are the result of the interactions between multiple features and cell types. With our protocols, users can easily perform an integrated analysis of the complete dataset and identify those multicellular programs.
We think that applying this protocol to additional multi-omics datasets will generate new insights into other diseases or contexts. These insights will then inform future biomarker or therapeutic studies. Begin by adding all the multi-omics input datasets to the input_data folder.
Here, they contain data from patients with stable, chronic, and acute coronary syndromes. To pre-process the data, click on the folder symbol, then double-click on the mofa_workflow, scripts, and configurations to access the configuration folder. Double-click on the Data_Configuration CSV file to open it.
In the value column, enter the paths to the input data and results folders. In the configuration name value column, specify a name to be added as a file extension for all saved files. To save the changes, select File and Save CSV File from the menu at the top.
Then using the navigation menu on the left side, click on scripts to go to the scripts folder. Double click on 00_Configuration_Update. ipynb to open the initialization notebook.
To run the script, click on the Restart Kernel and Run All Cells button at the top, and then click Restart in the pop-up. To navigate to the configurations folder, double-click on configurations. Then double-click on 1_Pre_Processing_SC_Data.
csv to open the file. Verify the automatically filled in values. Select File and Save CSV File from the menu at the top to save the changes.
Then use the navigation menu on the left side, and click on scripts to navigate to the scripts folder. Double-click on 01_Prepare_Pseudobulk. ipynb to open the notebook.
To run the script, click on the Restart Kernel and Run All Cells button at the top, and then click Restart in the pop-up. To navigate to the figures folder, double-click first on figures, and then on 01_figures. Open the newly generated plot, FIG01_Amount_of_Cells overview.
Then examine the plot to identify cell type clusters with a very low number of cells per sample. Note down the names of those cluster IDs to exclude them in the subsequent steps. To navigate back to the configuration folder, click on dots, and double-click on configurations.
Then open the file 02_Pre_Processing_Config_SC.csv. Add all cluster IDs identified for exclusion in the previous step, separated by commas in the cell_type_exclusion column. To save the changes, select File and Save CSV File from the menu at the top.
Now open the file 02_Pre_Processing_Config. csv, and adjust the pre-processing configuration for each dataset included and stored in the data input folder. Adjust the parameters in the columns as needed, depending on which pre-processing steps should be applied.
Save the changes by selecting File and Save CSV File. To navigate to the scripts folder, click on scripts. Open the notebook 02_Integrate_and_Normalize_Data_Sources.ipynb.
Click on the Restart Kernel and Run All Cells button at the top, and then click Restart in the pop-up. Next, navigate to the generated 02_results folder. Click on the folder symbol, then double-click on results and 02_results.
Verify that it includes the file 02_Combined_Data_Config_Name_INTEGRATED. csv containing the combined pre-processed data input file. Begin with the processed and harmonized input data, which contains five columns, sample id type, dataset, variable, and value.
This data will be used in the MOFA model. Then go to Jupyter Lab, and click on the folder symbol. Double-click on mofa_workflow, followed by scripts and configurations.
Open the file, 03_MOFA_Configs.csv. Enter the number of factors to be estimated in the MOFA model, and adjust the values in the file to define whether weighting and scaling should be applied. Select File and Save CSV File from the menu at the top to save the changes.
Using the navigation menu on the left side, navigate to the scripts folder by clicking on scripts. Then open the notebook 03_Run_MOFA.ipynb. Click on the Restart Kernel and Run All Cells button at the top to run the script, and then click restart in the pop-up.
To navigate to the 03_figures folder, double-click on figures and then 03_Figures. Open the generated plot FIG03_Overview_Variance_Decomposition_MOFRA_ResultName, and examine the model result. Go to the navigation menu on the left side.
Click on the folder symbol. Then double-click on input_data to navigate to the input_data folder. Drag and drop the Prepared.
csv containing all the metadata of the samples to be analyzed in association with the generated factors file into the input_data folder. Click on the folder symbol. Then double-click on mofa_workflow, followed by scripts and configurations to navigate back to the configurations folder.
Open the file 04_Factor_Analysis CSV. In the numerical variates column, add the names of all numeric columns in the prepared sample metadata CSV file that will be investigated in relation to the MOFA factors, separated by commas. In the categorical covariates column, add the names of all categorical columns in the prepared sample metadata CSV file that will be investigated in relation to the MOFA factors, separated by commas.
Save the changes by selecting File and Save CSV File from the menu at the top. Next, click on scripts to navigate to the scripts folder. Double-click on the notebook 04_Downstream_Factor_Analysis.
ipynb to open it. To run the script, click on the Restart Kernel and Run All Cells button at the top, and then click Restart in the pop-up. Use the navigation menu on the left to navigate to the 04_figures folder by double clicking on figures and then 04_figures.
To open the generated plots, double-click on them, and investigate the factors for interesting patterns and associations.