Large Scale Non-targeted Metabolomic Profiling of Serum by Ultra Performance Liquid Chromatography-Mass Spectrometry (UPLC-MS)

Corey D. Broeckling; Adam L. Heuberger; Jessica E. Prenni

doi:10.3791/50242

A subscription to JoVE is required to view this content. Sign in or start your free trial.

Summary

Non-targeted metabolite profiling by ultra performance liquid chromatography coupled with mass spectrometry (UPLC-MS) is a powerful technique to investigate metabolism. This article outlines a typical workflow utilized for non-targeted metabolite profiling of serum including sample organization and preparation, data acquisition, data analysis, quality control, and metabolite identification.

Abstract

Non-targeted metabolite profiling by ultra performance liquid chromatography coupled with mass spectrometry (UPLC-MS) is a powerful technique to investigate metabolism. The approach offers an unbiased and in-depth analysis that can enable the development of diagnostic tests, novel therapies, and further our understanding of disease processes. The inherent chemical diversity of the metabolome creates significant analytical challenges and there is no single experimental approach that can detect all metabolites. Additionally, the biological variation in individual metabolism and the dependence of metabolism on environmental factors necessitates large sample numbers to achieve the appropriate statistical power required for meaningful biological interpretation. To address these challenges, this tutorial outlines an analytical workflow for large scale non-targeted metabolite profiling of serum by UPLC-MS. The procedure includes guidelines for sample organization and preparation, data acquisition, quality control, and metabolite identification and will enable reliable acquisition of data for large experiments and provide a starting point for laboratories new to non-targeted metabolite profiling by UPLC-MS.

Introduction

The term "metabolomics" can encompass many things. For example, a metabolomics experiment can be performed using a variety of analytical platforms such as NMR and both gas and/or liquid chromatography coupled with mass spectrometry. Furthermore, metabolomics experiments can be performed in a targeted or non-targeted manner, or a combination of both. A targeted metabolomics experiment will involve directed analysis of a panel of molecules important to the biological question at hand (e.g. small molecules involved in the TCA cycle will allow for accurate quantitation of that pathway). In this situation, the biological hypothesis is dictating the choice of metabolites to be targeted in the analysis and the analytical steps are optimized for the detection of these molecules. Alternatively, a non-targeted metabolomics experiment is hypothesis generating. In this case, the experiment is performed in a broad and unbiased manner to enable detection of as many metabolites as possible. The results from a non-targeted experiment will drive the next step of the research (which in many cases may involve a targeted metabolomics workflow). It is also possible to combine the two approaches, in which case an experiment is performed in a non-targeted manner while concurrently a panel of known molecules are monitored within the data.

The tutorial presented here is focused specifically on non-targeted metabolite profiling of serum. As described above, the non-targeted approach provides an unbiased view of the detectable metabolites, can generate large amounts of information, and ultimately allow for novel discoveries. The use of this approach, specifically employing ultra performance liquid chromatography coupled with mass spectrometry (UPLC-MS), is becoming widespread ^{1, 2, 3} and involves the following steps: (1) experimental design (2) sample collection (3) sample preparation (4) data acquisition by UPLC-MS (5) data pre-processing (peak detection, integration, alignment, and normalization) (6) statistical data analysis (both uni- and multivariate) (7) metabolite identification and (8) biological interpretation.

Currently, there are no established standard methods for UPLC-MS based non-targeted metabolite profiling and subsequent data pre-processing steps. This lack of standardization is due in part to one of the primary analytical challenges of metabolite profiling; the chemical diversity of the metabolome. Because of this diversity, it is impossible for a single extraction method or mass spectrometry acquisition method to provide comprehensive coverage of all metabolites in a single analysis. In concept, metabolite coverage can be maximized by using multiple extractions (e.g. aqueous, methanol, chloroform:methanol, etc.) coupled with various chromatographic conditions (e.g. reverse phase, HILIC, etc.) and various ionization modes (e.g. positive ion, negative ion, chemical ionization, etc.). Often, however, researchers do not have a pre-determined bias for a specific chemical class and thus the expense of performing multiple extractions and instrument acquisitions is not warranted, especially for large-scale experiments. Thus, the video tutorial presented here was designed to provide a general procedure for large scale non-targeted metabolite profiling of serum by UPLC-MS. It will enable new and established laboratories to perform these types of experiments and the building blocks upon which they can expand the approach for various sample types, specific chemical classes, or targeted analysis. Specifically, this protocol will include the steps of: serum sample preparation, sample organization for large scale studies, UPLC-MS data acquisition, quality control (QC) procedures, and metabolite identification. Strategies for data pre-processing and statistical analysis are also presented.

The protocol will not focus on the steps of experimental design, sample collection, or biological data interpretation as it is outside the scope of this tutorial. However, many resources exist in the literature for these topics and the authors encourage researchers new to metabolomics to explore these thoroughly ^{4, 5, 6, 7, 8, 9}. In particular, experimental design is extremely important and is critical to the success of a non-targeted metabolomics experiment. Factors such as appropriate biological replication and consistency in sample collection procedure (e.g. time on bench, storage temperature, storage time, freeze-thaws, etc.) must be considered to ensure a viable study and to facilitate appropriate biological interpretation of the data.

Protocol

1. Sample Organization

To create a plate map for sample preparation on a spreadsheet, in the first column enter a sample list in order of loading. In a second column enter the 96 well plate locations using correct nomenclature for your autosampler software.
Save one well in each plate for QC samples.
If your LC autosampler can handle two plates at a time, separate the sample list into batches of 190 and save this information in a second worksheet.
Within each of these batches, use the randomize function [note: "=rand()"in Excel] to randomize the injection order. Now copy and paste your injection order into the LC-MS sample queue software system. Randomization should occur among batches, within batches, and be unique across injection replicates.

2. Serum and Quality Control (QC) Sample Preparation Procedures

Prepare a 10 ml stock of QC sample containing minimum of four compounds of known mass and retention time that span the chromatographic elution of the experiment. (note: caffeine, reserpine, sulfadimethoxine, and terfenadine at a concentration of 2 μg/ml each).
Thaw the serum samples on ice and gently vortex for ~5 sec.
Pre-wet the tips of a 12-channel pipettor with methanol to help prevent tip-dripping during sample dispensing. Then add 370 μl of cold methanol to each well of a 96 well plate.
Transfer 100 μl of each sample to the corresponding plate location from the plate maps. Cover the plates and incubating at -80 °C for 30 min to precipitate the proteins.
Spin the plates in a 4 °C centrifuge at 3,200 x g for 30 min. Transfer 250 μl of supernatant to a new 96-well and repeat the centrifugation.
Now transfer 60 μl aliquots of the supernatant to three 96-well plates suitable for the UPLC or LC autosampler.
Add 100 μl of QC sample mix to the reserved well position on the plate.
Seal each plate with adhesive film. Solvent may evaporate quickly after the well is pierced, and therefore injections should only be performed once from each plate. Plates in the sample queue can be stored in the short-term at -20 °C.

3. UPLC-MS Data Acquisition

Set the autosampler to 4 °C and place first two plates into autosampler.
Set up the acquisition method using one of the two alternative reverse phase gradients. Gradient A is biased towards non-polar molecules while gradient B will provide improved coverage of moderately polar molecules.
1. Gradient A:
  
  Time (min) %A %B curve
  0.0 100 0
  0.1 100 0 6
  1.0 60 40 6
  3.0 30 70 6
  11.0 0 100 6
  17.0 0 100 6
  17.1 100 0 6
  23.0 100 0 6
2. Gradient B
  
  Time (min) %A %B curve
  0.0 100 0
  1 100 0 6
  13 5 95 6
  16 5 95 6
  16.05 100 0 6
  20 100 0 6
Set mass spectrometer to collect data in positive mode scanning from 50-1,200 m/z. Additional specific instrument conditions will vary according to the system used. The following represents the conditions use on our Waters Xevo G2 Q-TOF mass spectrometer: Collision Energy = 6V; Scan Time =0.2 sec per scan; Capillary Voltage =2,200 V; Source Temperature =150 °C Desolvation Temperature =350 °C Desolvation Gas Flow (N2) = 800 L/hr).
Inject the QC samples five times at the beginning of each 2-plate batch of samples. After conditioning the UPLC-MS system with the first two injections, monitor for peak area (<25% RSD), retention time (+/- 0.05 min), and mass accuracy (+/- 3 ppm) of the third through fifth injections. These parameters may need to be adjusted to fit the specifications of the instrument being used for data acquisition.
If QC samples pass, acquire data on serum samples for those two plates. If QC samples fail, sample data is not collected until the issue is resolved and the QC samples pass. In our laboratory we have found that these parameters are consistently stable over three days of injections (i.e. the injection of 2 96 well plates). If your system is found to be unstable over this time period, more frequent QC injections may be appropriate. The QC sample is used not to correct poor quality data, but to prevent poor quality data from ever being collected.

4. Peak Detection, Integration, Alignment, and Normalization

There are a variety of options to perform these steps including freeware and vendor specific tools. This step has been skipped in the video tutorial but our approach is described below as an example.

For each sample, the analytical data generated from this type of experiment is a profile of ions as described by a retention time, m/z value, and a spectral intensity. A non-targeted metabolite profiling experiment will be comprised of many samples and thus peak detection, retention time alignment, and normalization must be performed to enable subsequent statistical analysis of the dataset.

Export raw data into .cdf format using the Waters DataBridge software. Perform peak detection and retention time alignment using the XCMS software package ¹⁰ (freeware available at http://metlin.scripps.edu/xcms/) and is executed in the R statistical environment¹¹). The website contains documentation on how to download and run the software.
Use the "matchedfilter" method within XCMS with the following parameters: full width at half max = 8 sec, S/N ration = 3, maximum of 100 peaks for each extracted ion chromatogram. Perform feature grouping, retention time correction, regrouping, and fillPeaks steps. The output of the XCMS software is a data matrix with all detectable features listed as rows, each experimental sample as a column, and cells are populated with the peak area of each feature in each sample. Transposition of this dataset may be necessary for incorporation into downstream statistical workflows.
Normalize the data matrix in R by dividing the peak area of each feature in a given sample by the sum of the extracted ion peak area (referred to as total ion current, or TIC for that sample (multiplied by a 100,000 to make the numbers more amenable for visual data interpretation).
Remove outliers. Outliers in the data can result from either a bad injection of a good sample or a good injection of a bad sample.
1. Perform outlier detection in R using the outliers package with the following parameters: (1) TIC values and (2) PC scores with 99.9% confidence.
2. If outliers are present, the corresponding data files are removed and peak detection, alignment, and normalization procedures are performed again.
Average the normalized peak areas by sample identifier (to average multiple injections). This must be done prior to performing statistical analysis (note injection replicates account for analytical variation, but cannot be used as true biological replicates).

5. Statistical Analysis

For metabolomics experiments, both multivariate and univariate statistical techniques are necessary for data interpretation. Two techniques that are commonly used include Principal Component Analysis and Analysis of Variance (ANOVA). There are multiple ways to approach statistical analysis of the data and both open source and commercial software tools are available. This step has been skipped in the video tutorial but our approach is described below as an example.

Perform ANOVA by applying an aov function to the dataset in R.
Account for false positives using a Benjamini-Hochberg adjustment (p.adjust function in R).
Conduct PCA with the Pareto-scaled (unit variance can also be applied) and mean-centered dataset. Both PC scores and loadings are calculated using the pcaMethods package in R.
Use ANOVA p-values, loading scores, and fold-change parameters to determine molecular features that best described variation associated with the experimental treatment.
Manually interrogate each molecular feature of interest to determine if it corresponds to a molecular ion, adduct (e.g. sodium, potassium, or dimer), or neutral loss.

6. Metabolite Identification

The workflow presented here is very general and can be applied to results from any instrument platform. An alternative strategy is presented in the Discussion section.

For all statistically significant features (as determined from analysis in Step 6) review the raw data and confirm that parent ions, sodium adducts, isotopes, and fragments follow similar trends across the dataset. For example, the ANOVA-derived p-value for a parent ion should be similar to its isotope. This indicates that both molecular features correspond to the same metabolite.
Search accurate mass measurements and/or neutral losses of statistically significant molecular features (choose C12 isotope if multiple isotopes are significant) against in-house (if available) or publically available metabolite databases such as:
1. Human Metabolome Database (http://www.hmdb.ca/)
2. Metlin (http://metlin.scripps.edu/metabo_search_alt2.php)
3. Mass Bank (http://www.massbank.jp/?lang=en)
4. Lipid Maps (http://www.lipidmaps.org/data/standards/index.html)
5. National Institute of Standards and Technology MS Search (http://chemdata.nist.gov/mass-spc/ms-search/)
Use software to predict a molecular formula from the accurate mass and isotopic distribution of your significant molecular features. In Waters software, this is performed using the Elemental Comp tool in the MassLynx package.
Filter candidate metabolite identifications by: mass error (as suitable for your instrument platform), predicted molecular formula, biological relevance in the trends observed with respect to experimental design, and retention time (e.g. lipids and other non-polar compounds will have longer retention times when employing reverse phase chromatography).
Perform subsequent experiments to acquire the MS/MS fragmentation for statistically significant molecular features. Whenever possible, metabolite identification should be based on matching experimental fragmentation peaks and retention time to that of an authentic standard acquired on the same instrument. Alternatively, experimental fragmentation can be matched to a spectral database.
Identification confidence should be assigned to all reported metabolite identifications based on metabolomics standards initiative recommendations¹².
Level 1: confident molecular identification based on orthogonal analytical parameters (accurate mass, retention time, and MS/MS fragmentation) relative to an authentic compound.
Level 2 refers to a putative identification base on physicochemical properties and/or spectral similarity with literature or spectral libraries.
Level 3 refers to the putative identification of a compound class based on physicochemical properties or spectral similarity.
Level 4 refers to an unknown compound.

Results

The basic analytical steps of a non-targeted metabolite profiling experiment by UPLC-MS are outlined in Figure 1. The raw data for each sample can be visualized as a base peak chromatogram. Figure 2 shows an example base peak chromatogram of a serum sample analyzed by gradient option (a) in the tutorial. Following statistical analysis as described above, metabolite identification is attempted for all statistically significant molecular features. Confident identification (level 1) req...

Discussion

This tutorial is meant to serve as a starting point for conducting large scale non-targeted metabolite profiling by UPLC-MS. The workflow is focused on metabolites that can be extracted with an aqueous methanol solvent, retained on a C8 or C18 UPLC column, and detected as positive ions. In the situation where there is not a pre-determined bias towards a specific metabolite class and a hypothesis generating global profile is desired, this protocol is valuable as it will result in the detection of a large percentage of ser...

Disclosures

The authors declare that they have no competing financial interests.

Acknowledgements

The presented tutorial was performed and developed within the Proteomics and Metabolomics Facility at Colorado State University which is partially funded by the CSU Research Administration Resources for Scholarly Projects.

Materials

Name	Company	Catalog Number	Comments
96 well plates - 500 μl wells	VWR	40002-020	These are used for sample preparation
96 well plate mats	VWR	89026-514	These are used for sample preparation
96 well plates - 350 μl wells	Waters Corporation	WAT058943	These are used for sample injection
96 well plate mats	Waters Corporation	186000857	These are used for sample injection
96 well plate heat seals	Waters Corporation	186002789	These can be used for sample injection or long term storage
96 well plate heat sealer	Waters Corporation	186002786
LC-MS grade methanol	Fluka	34966
LC-MS grade acetonitrile	Fluka	34967
LC-MS grade aater	Fluka	39253
LC-MS grade formic acid	Fluka	56302
Multichannel electronic pipettor	VWR	89000-674
Pipett tips	Eclipse (purchased through Light Labs)	B-5061/B-4061
Chilled centrifuge - Allegra X-12R	Beckman Coulter	N/A - contact Beckman Coulter
Acquity Ultra performance Liquid Chromatography (UPLC) System	Waters Corporation	N/A - contact Waters Corporation
UPLC C8 column (gradient option a)	Waters Corporation	186002876
UplC T3 column (gradient option b)	Waters Corporation	186003536
Xevo G2 Q-TOF Mass spectrometer	Waters Corporation	N/A - contact Waters Corporation

References

Theodoridis, G., Gika, H. G., Wilson, I. D. Mass Spectrometry-Based Holistic Analytical Approaches for Metabolite Profiling in Systems Biology Studies. Mass Spectrom. Rev. 30, 884-906 (2011).
Zelena, E., et al. Development of a robust and repeatable UPLC-MS method for the long-term metabolomic study of human serum. Analytical Chemistry. 81, 1357-1364 (2009).
Michopoulos, F., Lai, L., Gika, H., Theodoridis, G., Wilson, I. UPLC-MS-based analysis of human plasma for metabonomics using solvent precipitation or solid phase extraction. Journal of Proteome Research. 8, 2114-2121 (2009).
Kamburov, A., Cavill, R., Ebbels, T. M., Herwig, R., Keun, H. C. Integrated pathway-level analysis of transcriptomics and metabolomics data with IMPaLA. Bioinformatics. 27, 2917-2918 (2011).
Xia, J., Wishart, D. S. MetPA: a web-based metabolomics tool for pathway analysis and visualization. Bioinformatics. 26, 2342-2344 (2010).
Johnson, C. H., Gonzalez, F. J. Challenges and opportunities of metabolomics. J. Cell. Physiol. 227, 2975-2981 (2012).
Issaq, H. J., Waybright, T. J., Veenstra, T. D. Cancer biomarker discovery: Opportunities and pitfalls in analytical methods. Electrophoresis. 32, 967-975 (2011).
Zhou, B., Xiao, J. F., Tuli, L., Ressom, H. W. LC-MS-based metabolomics. Mol. Biosyst. 8, 470-481 (2012).
Weckwerth, W., Morgenthal, K. Metabolomics: from pattern recognition to biological interpretation. Drug Discov. Today. 10, 1551-1558 (2005).
Smith, C. A., et al. METLIN - A metabolite mass spectral database. Ther. Drug Monit. 27, 747-751 (2005).
Culpepper, S. A., Aguinis, H. R is for Revolution: A Cutting-Edge, Free, Open Source Statistical Package. Organ Res. Methods. 14, 735-740 (2011).
Sumner, L. W., et al. Proposed minimum reporting standards for chemical analysis. Metabolomics. 3, 211-221 (2007).
Vuckovic, D. Current trends and challenges in sample preparation for global metabolomics using liquid chromatography-mass spectrometry. Anal. Bioanal. Chem. 403, 1523-1548 (2012).
Garcia-Canaveras, J. C., Donato, M. T., Castell, J. V., Lahoz, A. A comprehensive untargeted metabonomic analysis of human steatotic liver tissue by RP and HILIC chromatography coupled to mass spectrometry reveals important metabolic alterations. Journal of Proteome Research. 10, 4825-4834 (2011).
Nordstrom, A., Want, E., Northen, T., Lehtio, J., Siuzdak, G. Multiple ionization mass spectrometry strategy used to reveal the complexity of metabolomics. Analytical Chemistry. 80, 421-429 (2008).
Horai, H., et al. MassBank: a public repository for sharing mass spectral data for life sciences. J. Mass Spectrom. 45, 703-714 (2010).
Wishart, D. S. Advances in metabolite identification. Bioanalysis. 3, 1769-1782 (2011).
Broeckling, C., Heuberger, A., Prince, J., Ingelsson, E., Prenni, J. Assigning precursor-product ion relationships in indiscriminant MS/MS data from non-targeted metabolite profiling studies. Metabolomics. , 1-11 (2012).
Kangas, L. J., et al. In silico identification software (ISIS): a machine learning approach to tandem mass spectral identification of lipids. Bioinformatics. 28, 1705-1713 (2012).

Reprints and Permissions

Request permission to reuse the text or figures of this JoVE article

Request Permission

Explore More Articles

Metabolomic Profiling Non targeted Analysis UPLC MS Serum Analytical Workflow Sample Preparation Data Acquisition Quality Control Metabolite Identification

This article has been published

Video Coming Soon

Keep me updated:

Time (min)	%A	%B	curve
0.0	100	0
0.1	100	0	6
1.0	60	40	6
3.0	30	70	6
11.0	0	100	6
17.0	0	100	6
17.1	100	0	6
23.0	100	0	6

Time (min)	%A	%B	curve
0.0	100	0
1	100	0	6
13	5	95	6
16	5	95	6
16.05	100	0	6
20	100	0	6