A subscription to JoVE is required to view this content. Sign in or start your free trial.
We present CorrelationCalculator and Filigree, two tools for data-driven network construction and analysis of metabolomics data. CorrelationCalculator supports building a single interaction network of metabolites based on expression data, while Filigree allows building a differential network, followed by network clustering and enrichment analysis.
A significant challenge in the analysis of omics data is extracting actionable biological knowledge. Metabolomics is no exception. The general problem of relating changes in levels of individual metabolites to specific biological processes is compounded by the large number of unknown metabolites present in untargeted liquid chromatography-mass spectrometry (LC-MS) studies. Further, secondary metabolism and lipid metabolism are poorly represented in existing pathway databases. To overcome these limitations, our group has developed several tools for data-driven network construction and analysis. These include CorrelationCalculator and Filigree. Both tools allow users to build partial correlation-based networks from experimental metabolomics data when the number of metabolites exceeds the number of samples. CorrelationCalculator supports the construction of a single network, while Filigree allows building a differential network utilizing data from two groups of samples, followed by network clustering and enrichment analysis. We will describe the utility and application of both tools for the analysis of real-life metabolomics data.
In the last decade, metabolomics has emerged as an omics science due to advances in analytical technologies such as Gas Chromatography-Mass Spectrometry (GC-MS) and Liquid Chromatography-Mass Spectrometry (LC-MS). These techniques allow simultaneous measurement of hundreds to thousands of small molecule metabolites, creating complex multidimensional datasets. Metabolomics experiments can be performed in targeted or untargeted modes. Targeted metabolomics experiments measure specific classes of metabolites. They are usually hypothesis-driven, while untargeted approaches attempt to measure as many metabolites as possible and are hypothesis-generating in nature. Targeted assays usually include internal standards and thus allow for absolute quantification of metabolites of interest. In contrast, untargeted assays allow relative quantification and include many unknown metabolites1.
Analysis of metabolomics data is a multi-step process that leverages many specialized software tools1. It can be divided into the following three major steps: (1) data processing and quality control, (2) statistical analysis, and (3) biological data interpretation. The tools described here are designed to enable the latter step of the analysis.
An intuitive and popular way to interpret metabolomics data is to map the experimental measurements onto metabolic pathways. Numerous tools have been designed to achieve this2,3,4,5, including Metscape, developed by our group6. Pathway mapping is often combined with enrichment analysis, which helps identify the most significant pathways7,8. These techniques first gained prominence in the analysis of gene expression data and have been successfully applied for the analysis of proteomics and epigenomics data9,10,11,12,13. However, the analysis of metabolomics data presents a number of challenges for knowledge-based approaches. First, in addition to the endogenous metabolites, metabolomics assays measure exogenous compounds, including those that come from nutrition and other environmental sources. These compounds, as well as metabolites produced by bacteria, cannot be mapped onto human or metabolic pathways of other eukaryotic organisms. Further, pathway coverage of secondary metabolism and lipid metabolism currently does not allow high-resolution mapping at the level that would easily support the biological interpretation of the data14,15.
Data-driven network analysis techniques can help overcome these challenges. For example, correlation-based networks can help derive relationships among both known and unknown metabolites and facilitate the annotation of the unknowns16. While computing Pearson's correlation coefficients is the most straightforward approach to establishing the linear relationships between metabolites, the disadvantage is that it captures both direct and indirect associations17,18,19. An alternative is to compute partial correlation coefficients that can distinguish between direct and indirect associations. Gaussian graphical modeling (GGM) can be used to estimate partial correlation networks. However, GGM requires that the sample size and the number of features be comparable. This condition is rarely met in untargeted LC-MS data that contains measurements for thousands of metabolic features. Regularization techniques can be utilized to overcome this limitation. Graphical lasso (Glasso) and nodewise regression are popular methods for regularized estimation of the partial correlation network16,20.
The first of the bioinformatics tools presented here, CorrelationCalculator16, is based on the debiased sparse partial correlation (DSPC) algorithm. DSPC relies on de-sparsified graphical lasso modeling. The underlying assumption of the algorithm is that the number of connections among the metabolites is considerably smaller than the number of samples, i.e., the partial correlation network of metabolites is sparse. This assumption allows DSPC to discover the connectivity among large numbers of metabolites using fewer samples, leveraging regularized regression techniques. Further, using a debiasing step for the regularized regression estimates, it obtains sampling distributions for the edge parameters that can be used to construct confidence intervals and test hypotheses of interest (e.g., presence/absence of a single or a group of edges). The presence or absence of an edge in the partial correlation network can thus be formally tested using the computed p-values.
CorrelationCalculator proved to be very useful for single-group analysis16; however, the objective of many metabolomics experiments is the differential analysis of two or more conditions. While CorrelationCalculator can be employed on each of the groups separately to generate partial correlation networks for each condition, this approach limits the number of samples that can be used for network generation. Since a sufficiently large sample size is one of the biggest considerations in data-driven analysis, methods that can leverage all available samples in the data to construct networks are highly desirable. This approach is implemented in the second tool presented here, called Filigree21. Filigree relies on the previously published Differential Network Enrichment Analysis (DNEA) algorithm22. Table 1 shows the applications and the workflow of both tools.
Number of experimental conditions (k) | k = 1 | k = 2 |
Software tool | CorrelationCalculator | Filigree |
Input data | • Metabolites x Samples data matrix | • Metabolites x Samples data matrix • Experimental groups |
Workflow • Pretreatment • Network estimation • Network clustering • Enrichment analysis | • Log transformation; autoscaling • DSPC • Via external apps • No | • Log transformation; autoscaling • Joint network estimation • Consensus clustering • NetGSA |
Data visualization | Via external app, e.g., Cytoscape | Via external app, e.g., Cytoscape |
Testing metabolic modules for the association with outcome of interest (optional) | Via external apps | Via external apps |
Table 1: The scope of application and the workflow of CorrelationCalculator and Filigree.
1. CorrelationCalculator
2. Filigree
3. Additional considerations
To illustrate the use of CorrelationCalculator, we constructed a partial correlation network using a subset of the metabolomics data from the KORA population study described in Krumsiek et al.24. The dataset contained 151 metabolites and 240 samples. Figure 1 shows the resulting partial correlation network that was visualized in Cytoscape. The network contains 148 nodes and 272 edges. The color of the nodes represents metabolites that belong to different chem...
Partial correlation-based network analysis methods implemented in CorrelationCalculator and Filigree help overcome some of the limitations of knowledge-based metabolic pathway analyses, especially for the datasets with a high prevalence of unknown metabolites and limited coverage of metabolic pathways (e.g., lipidomics data). These tools have been widely used by the research community to analyze a broad range of metabolomics and lipidomics data14,22,
The authors have no competing financial interests.
This work was supported by NIH 1U01CA235487 grant.
Name | Company | Catalog Number | Comments |
CorrelationCalculator | JAVA | http://metscape.med.umich.edu/calculator.html | |
clusterNet | https://github.com/Karnovsky-Lab/clusterNet | ||
Cytoscape | Cytoscape | https://cytoscape.org/ | |
Filigree | JAVA | http://metscape.med.umich.edu/filigree.html | |
MetScape | Cytoscape | https://apps.cytoscape.org/apps/metscape | Cytoscape application that allows for the creation and exploration of correlation networks. |
Request permission to reuse the text or figures of this JoVE article
Request PermissionExplore More Articles
This article has been published
Video Coming Soon
Copyright © 2025 MyJoVE Corporation. All rights reserved