JoVE Logo

Zaloguj się

Aby wyświetlić tę treść, wymagana jest subskrypcja JoVE. Zaloguj się lub rozpocznij bezpłatny okres próbny.

W tym Artykule

  • Podsumowanie
  • Streszczenie
  • Wprowadzenie
  • Protokół
  • Wyniki
  • Dyskusje
  • Ujawnienia
  • Podziękowania
  • Materiały
  • Odniesienia
  • Przedruki i uprawnienia

Podsumowanie

We constructed an untargeted metabolomic workflow that integrated XY-Meta and metaX together. In this protocol, we displayed how to use XY-Meta to generate a decoy spectral library from open access spectra reference, and then performed FDR control and used the metaX to quantitate the metabolites after identifying the metabolomics spectra.

Streszczenie

Untargeted metabolomics techniques are being widely used in recent years. However, the rapidly increasing throughput and number of samples create an enormous amount of spectra, setting challenges for quality control of the mass spectrometry spectra. To reduce the false positives, false discovery rate (FDR) quality control is necessary. Recently, we developed a software for FDR control of untargeted metabolome identification that is based on a Target-Decoy strategy named XY-Meta. Here, we demonstrated a complete analysis pipeline that integrates XY-Meta and metaX together. This protocol shows how to use XY-meta to generate a decoy database from an existing reference database and perform FDR control using the Target-Decoy strategy for large-scale metabolome identification on an open-access dataset. The differential analysis and metabolites annotation were performed after running metaX for metabolites peaks detection and quantitation. In order to help more researchers, we also developed a user-friendly cloud-based analysis platform for these analyses, without the need for bioinformatics skills or any computer languages.

Wprowadzenie

Metabolites play important roles in biological processes. Metabolites are often regulators of various processes like energy transfer, hormone regulations, regulation of neurotransmitters, cellular communications, and protein post-translational modifications, etc1,2,3,4. Untargeted metabolomics provides a global view of numerous metabolites5,6. With advances in mass spectrometry and chromatography technologies, the throughput of metabolome MS/MS spectra is rapidly increasing in recent years7,8,9,10,11. To identify metabolites from these huge datasets, various annotation software were developed11, such as MZmine12, MS-FINDER13, CFM-ID14, MetFrag15, and SLAW16. However, these identifications often contain many false positives. The reasons include: (1) The MS/MS spectra contain random noise, which may mislead the peak matching. (2) Isomers and differences in fragmentation energies cause multiple spectra fingerprints and thus increase the volume of the reference library. (3) The quality of reference libraries varies. A proper standard to build a good reference spectral library is needed. Therefore, a systematic false discovery rate (FDR) control for untargeted metabolomics is essential for functional metabolome research7,8,9,17.

Both the Empirical Bayes approach and Target-Decoy strategy tackled the FDR control problem generally. Kerstin Scheubert et al. showed that the Target-Decoy strategy on decoy database generated from fragmentation tree-based method is the best method for FDR control9. Xusheng Wang et al. designed a method for decoy generation based on the octet rule in chemistry and improved the precision of FDR estimation17. The spectral library for generating decoy database was demonstrated for better performance18. Here, we improved the spectral library-based method and developed a software called XY-Meta19 that can further improve FDR estimation's precision. It uses the existing reference spectral library to generate a decoy library for the FDR control under the Target-Decoy scheme. XY-Meta supports its own spectra matching and cosine similarity algorithms. It allows conventional search and iterative search modes. In the step of FDR assessment, it supports Target-Decoy concatenated mode and separated mode. For better flexibility, XY-Meta accepts external decoy libraries.

Peak detection and quantification of metabolites is also an important step of untargeted metabolome analysis. Peak detection is the main method for metabolome identification. In general, the accuracy of peak detection of metabolites was affected by multiple factors, such as noise signals of mass spectrometry, low abundance of metabolites, contaminants, and degradation products of metabolites20. When the number of samples of is too large or the liquid chromatography column was replaced in experiments of untargeted metabolome, remarkable batch effects may appear, which is a major challenge for metabolome quantitation21,22,23. Currently, software like XCMS24, Workflow4Metabolomic25, iMet-Q26, and metaX19 can perform peak detection and quantitation of untargeted metabolome, but we suggest that the pipeline of metaX is more complete and easier to use. Here, we demonstrate the process of identification and FDR control for a publicly available dataset msv000084112 using XY-Meta, and the peak detection and quantification of metabolites using metaX. This workflow only requires two groups, and each group needs at least two samples. MS/MS spectra data is needed, regardless of the mass spectrometer platform, ionization mode, charge mode, and sample type, and can support sample-based normalization and peak-based normalization. Following this example, researchers can perform metabolomics identification and quantification in an easy-to-handle way. Using this pipeline requires R programming capability. To help the researcher without any programming knowledge, we also developed a cloud analysis platform for metabolomics analysis. We demonstrated this cloud analysis platform in Supplementary Material 5.

Access restricted. Please log in or start a trial to view this content.

Protokół

1. Prepare metabolomics datasets for analysis

NOTE: In this demonstration, we use metabolomics datasets without QC sample. Data for case and control groups are needed. For demonstration, we use a public dataset in GNPS database27.

  1. Go to the webpage https://gnps.ucsd.edu/ProteoSAFe/static/gnps-splash.jsp. Click Browse Datasets.
  2. Search the keyword "msv000084112" in the Title column. Click the dataset ID number for details and download the dataset using FTP.
  3. Put the raw data in the folder /msv000084112.
    ​NOTE: This dataset was acquired using C18 RP-UHPLC on Q Exactive platform in positive mode. It represents a cohort with an uncharacterized disease of the metabolism of urine samples data, including 33 samples of healthy people, 12 blank samples, two mixing samples, and 82 samples of patients28 (Supplementary Material 8). To demonstrate the workflow, we randomly chose six samples of healthy people (NH) as a control group and six samples with the disease (NT) as a case group to perform the workflow.

2. Data format conversion

NOTE: If the dataset is the raw data generated directly from the mass spectrometer, it is usually in .raw, .wiff or .cdf format. They should be converted to mzXML and mgf formats. Here, we use the msconvert tool in ProteoWizard29 package to do the format conversion.

  1. Download the ProteoWizard from https://proteowizard.sourceforge.io/download.html and install it.
  2. Convert data format using msconvert.exe under the ProteoWizard installation path.
    1. Convert the raw data to mzXML format and store them in /mzXML folder:/msconvert.exe /raw/*.raw -o /raw/mzXML/ --filter "peakPicking true [1,2]" --filter "zeroSamples removeExtra" --mzML --zlib --mz64 --filter "msLevel 1-2" --filter "titleMaker <RunId>.<ScanNumber>.<ScanNumber>. <ChargeState>".
    2. Convert the raw/mzXML data to mgf format and store them in /mgf folder:/msconvert.exe /msv000084112/*.raw -o /msv000084112/mgf/ --filter "peakPicking true [1,2]" --filter "zeroSamples removeExtra" --mgf --mz64 --filter "msLevel 1-2" --filter "titleMaker <RunId>.<ScanNumber>.<ScanNumber>. <ChargeState>".

3. Prepare the reference spectral library for the metabolites

NOTE: XY-meta supports the reference spectral libraries only in mgf format.

  1. Go to the webpage https://gnps.ucsd.edu/ProteoSAFe/libraries.jsp. Search the keyword “NIST” to find the item. Click View for the details and download the library.
    NOTE: GNPS Public Spectral Libraries collected many metabolites libraries, arranged in type, origin, species, and collection modes. Although only a small fraction of these libraries are generated using standard materials, they are usually sufficient for most fundamental research.
  2. Put the downloaded library GNPS-NIST14-MATCHES.mgf into the /database folder.

4. Metabolites identification and FDR control

  1. Download the XY-meta (Windows version). Find the parameter configuration file parameter.default under the /XY-Meta-Win/config/ folder. Change its content according to Supplementary Material 1.
    NOTE: In solution, metabolites often form adducts with anions or cations, which leads to a mass shift of parent ions. Therefore, it is necessary to set the types of adducts. We provided adduct lists for ion exchange column and reverse analytical columns under positive charge mode and negative charge mode in the /adduct folder. Users may also edit their own adduct list according to their research project. The adduct list should be in the same format as the provided list.
  2. Perform the metabolite identification and FDR control using XY-Meta:XY-Meta.exe -S /XY-Meta-Win/config/parameter.default -D /msv000084112/ pos_wt-1_a.mgf -R /database/GNPS-NIST14-MATCHES.mgf.
    ​NOTE: XY-Meta does not support wildcards in parameters. Therefore, a single command should be used to process each mgf file. For a large number of files, a batch file is recommended.

5. Differential analysis

NOTE: metaX is an open-source R package. Please install it according to the guide at https://github.com/wenbostar/metaX. 8GB RAM is required for this analysis.

  1. Edit a sampleList.txt file to specify the sample and its corresponding MS data. Please refer to Supplementary Material 2.
    NOTE: metaX supports quantitative analysis for the datasets with QC samples. When using QC samples, please modify the class property to NA for QC samples.
  2. Create /output folder to store the results of quantitative analysis. Use R to run the script in Supplementary Material 3 to use metaX to quantify the MOCK and WT groups.
    ​NOTE: Before running the script in Supplementary Material 3, modify the paths in the script to the actual local paths.

6. Integration of qualitative and quantitative results

  1. Run the R script in Supplementary Material 4 to annotate the peaks in qualitative and quantitative analysis using metabolite identifications.
    NOTE: Before running the script in Supplementary Material 4, please modify the paths in the script to your actual local paths.

Access restricted. Please log in or start a trial to view this content.

Wyniki

The raw data of msv000084112 was converted by msconvert.exe and generated mgf files (Supplementary Material S6).

XY-Meta generated GNPS-NIST14-MATCHES_Decoy.mgf file under /database folder. This is the decoy library generated from the original reference spectral library GNPS-NIST14-MATCHES.mgf. This decoy library can be reused. When reusing this decoy library, the user should set the decoy_pattern as 1 in parameter.default file, and set the decoyinput as the absolute path of t...

Access restricted. Please log in or start a trial to view this content.

Dyskusje

The FDR control of untargeted metabolites has been a great challenge. Here, we demonstrated a complete pipeline of large-scale untargeted metabolomics analysis (qualitative and quantitative) with FDR control. This effectively reduces the false positives, which are very common in MS analysis.

Preparing an appropriate reference spectral library for your study is a key point. A successful and sensitive MS/MS identification requires not only proper matching algorithms, but also proper reference sp...

Access restricted. Please log in or start a trial to view this content.

Ujawnienia

No conflicts of interest.

Podziękowania

This work is supported by National Key Research and Development Program (2018YFC0910200/2017YFA0505001) and the Guangdong Key R&D Program (2019B020226001).

Access restricted. Please log in or start a trial to view this content.

Materiały

NameCompanyCatalog NumberComments
GNPSopen sourcen/ahttps://gnps.ucsd.edu/ProteoSAFe/static/gnps-splash.jsp
XY-Metaopen sourcen/ahttps://github.com/DLI-ShenZhen/XY-Meta
metaXopen sourcen/ahttps://github.com/wenbostar/metaX
ProteoWizardFree Download3.0.22116.18c918b-x86_64https://proteowizard.sourceforge.io/download.html
CHI.ClientFree Downloadndp48-x86-x64-allos-enuhttp://www.chi-biotech.com/technology.html?ty=ypt

Odniesienia

  1. Misra, B. B., Fahrmann, J. F., Grapov, D. Review of emerging metabolomic tools and resources: 2015-2016. Electrophoresis. 38 (18), 2257-2274 (2017).
  2. Idle, J. R., Gonzalez, F. J. Metabolomics. Cell Metabolism. 6 (5), 348-351 (2007).
  3. Fiehn, O. Metabolomics — the link between genotypes and phenotypes. Functional Genomics. Town, C. , Springer. Netherlands. Dordrecht. 155-171 (2002).
  4. Functional Genomics. Town, C. , Springer. Netherlands. Dordrecht. (2002).
  5. Dettmer, K., Aronov, P. A., Hammock, B. D. Mass spectrometry-based metabolomics. Mass Spectrometry Reviews. 26 (1), 51-78 (2007).
  6. Vinayavekhin, N., Saghatelian, A. Untargeted metabolomics. Current Protocols in Molecular Biology. , Chapter 30, Unit 30.1 1-24 (2010).
  7. Chaleckis, R., Meister, I., Zhang, P., Wheelock, C. E. Challenges, progress and promises of metabolite annotation for LC-MS-based metabolomics. Current Opinion in Biotechnology. 55, 44-50 (2019).
  8. Palmer, A., et al. FDR-controlled metabolite annotation for high-resolution imaging mass spectrometry. Nature Methods. 14 (1), 57-60 (2017).
  9. Scheubert, K., et al. Significance estimation for large scale metabolomics annotations by spectral matching. Nature Communications. 8 (1), 1494(2017).
  10. Schrimpe-Rutledge, A. C., Codreanu, S. G., Sherrod, S. D., McLean, J. A. Untargeted metabolomics strategies-challenges and emerging directions. Journal of the American Society for Mass Spectrometry. 27 (12), 1897-1905 (2016).
  11. Blaženović, I., Kind, T., Ji, J., Fiehn, O. Software tools and approaches for compound identification of LC-MS/MS data in metabolomics. Metabolites. 8 (2), (2018).
  12. Katajamaa, M., Miettinen, J., Oresic, M. MZmine: toolbox for processing and visualization of mass spectrometry based molecular profile data. Bioinformatics. 22 (5), Oxford, England. 634-636 (2006).
  13. Tsugawa, H., et al. Hydrogen rearrangement rules: computational MS/MS fragmentation and structure elucidation using MS-FINDER software. Analytical chemistry. 88 (16), 7946-7958 (2016).
  14. Wang, F., et al. CFM-ID 4.0: More accurate ESI-MS/MS spectral prediction and compound identification. Analytical Chemistry. 93 (34), 11692-11700 (2021).
  15. Ruttkies, C., Schymanski, E. L., Wolf, S., Hollender, J., Neumann, S. MetFrag relaunched: incorporating strategies beyond in silico fragmentation. Journal of Cheminformatics. 8, 3(2016).
  16. Delabriere, A., Warmer, P., Brennsteiner, V., Zamboni, N. SLAW: A scalable and self-optimizing processing workflow for untargeted LC-MS. Analytical chemistry. 93 (45), 15024-15032 (2021).
  17. Wang, X., et al. Target-decoy-based false discovery rate estimation for large-scale metabolite identification. Journal of Proteome Research. 17 (7), 2328-2334 (2018).
  18. Li, D., et al. XY-Meta: a high-efficiency search engine for large-scale metabolome annotation with accurate FDR estimation. Analytical Chemistry. 92 (8), 5701-5707 (2020).
  19. Wen, B., Mei, Z., Zeng, C., Liu, S. metaX: a flexible and comprehensive software for processing metabolomics data. BMC Bioinformatics. 18 (1), 183(2017).
  20. Aberg, K. M., Torgrip, R. J. O., Kolmert, J., Schuppe-Koistinen, I., Lindberg, J. Feature detection and alignment of hyphenated chromatographic-mass spectrometric data. Extraction of pure ion chromatograms using Kalman tracking. Journal of Chromatography. A. 1192 (1), 139-146 (2008).
  21. Liu, Q., et al. Addressing the batch effect issue for LC/MS metabolomics data in data preprocessing. Scientific Reports. 10 (1), 13856(2020).
  22. Han, W., Li, L. Evaluating and minimizing batch effects in metabolomics. Mass Spectrometry Reviews. 41 (3), 421-442 (2022).
  23. Fei, F., Bowdish, D. M. E., McCarry, B. E. Comprehensive and simultaneous coverage of lipid and polar metabolites for endogenous cellular metabolomics using HILIC-TOF-MS. Analytical and Bioanalytical Chemistry. 406 (15), 3723-3733 (2014).
  24. Smith, C. A., Want, E. J., O'Maille, G., Abagyan, R., Siuzdak, G. XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Analytical Chemistry. 78 (3), 779-787 (2006).
  25. Giacomoni, F., et al. Workflow4Metabolomics: a collaborative research infrastructure for computational metabolomics. Bioinformatics. 31 (9), Oxford, England. 1493-1495 (2015).
  26. Chang, H. -Y., et al. iMet-Q: A user-friendly tool for label-free metabolomics quantitation using dynamic peak-width determination. PloS One. 11 (1), 0146112(2016).
  27. Wang, M., et al. Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking. Nature Biotechnology. 34 (8), 828-837 (2016).
  28. Schmid, R., et al. Ion identity molecular networking for mass spectrometry-based metabolomics in the GNPS environment. Nature Communications. 12 (1), 3832(2021).
  29. Kessner, D., Chambers, M., Burke, R., Agus, D., Mallick, P. ProteoWizard: open source software for rapid proteomics tools development. Bioinformatics. 24 (21), Oxford, England. 2534-2536 (2008).
  30. Johnson, S. R., Lange, B. M. Open-access metabolomics databases for natural product research: present capabilities and future potential. Frontiers in Bioengineering and Biotechnology. 3, 22(2015).
  31. Horai, H., et al. MassBank: a public repository for sharing mass spectral data for life sciences. Journal of Mass Spectrometry: JMS. 45 (7), 703-714 (2010).
  32. Rawlinson, C., et al. Hierarchical clustering of MS/MS spectra from the firefly metabolome identifies new lucibufagin compounds. Scientific Reports. 10 (1), 6043(2020).

Access restricted. Please log in or start a trial to view this content.

Przedruki i uprawnienia

Zapytaj o uprawnienia na użycie tekstu lub obrazów z tego artykułu JoVE

Zapytaj o uprawnienia

Przeglądaj więcej artyków

Integrated WorkflowUntargeted MetabolomeFDR ControlXY MetaTarget decoy StrategyBiomarker DiscoveryMetabolite IdentificationGNPS DatabaseProteoWizardMzXML FormatMGF FormatSpectral LibraryMetaX SoftwareR ScriptQuantitative Analysis

This article has been published

Video Coming Soon

JoVE Logo

Prywatność

Warunki Korzystania

Zasady

Badania

Edukacja

O JoVE

Copyright © 2025 MyJoVE Corporation. Wszelkie prawa zastrzeżone