A subscription to JoVE is required to view this content. Sign in or start your free trial.
Clinical metaproteomics offers insights into the human microbiome and its contributions to disease. We harnessed the computational power of the Galaxy platform to develop a modular bioinformatics workflow that facilitates complex mass spectrometry-based metaproteomic analysis and characterization of diverse clinical sample types relevant to studies of disease.
Clinical metaproteomics reveals host-microbiome interactions underlying diseases. However, challenges to this approach exist. In particular, the characterization of microbial proteins present in low abundance relative to host proteins is difficult. Other significant challenges are attributed to using very large protein sequence databases, which impedes sensitivity and accuracy during peptide and protein identification from mass spectrometry data in addition to retrieving taxonomy and functional annotations and performing statistical analysis. To address these problems, we present an integrated bioinformatics workflow for mass spectrometry-based metaproteomics that combines custom protein sequence database generation, peptide-spectrum match generation and verification, quantification, taxonomic and functional annotations, and statistical analysis. This workflow also offers characterization of human proteins (while prioritizing microbial proteins), thus offering insights into host-microbe dynamics in disease. The tools and workflow are deployed in the Galaxy ecosystem, enabling the development, optimization, and dissemination of these computational resources. We have applied this workflow for metaproteomic analysis of numerous clinical sample types, such as nasopharyngeal swabs and bronchoalveolar lavage fluid. Here, we demonstrate its utility via the analysis of residual fluid from cervical swabs. The complete workflow and accompanying training resources are accessible on the Galaxy Training Network to equip non-experts and experienced researchers with the necessary knowledge and tools to analyze their data.
Mass spectrometry (MS)-based metaproteomics identifies and quantifies microbial and human proteins from clinical samples. This approach provides a new understanding of microbiome responses to disease and uncovers potential mediators of host-microbiome interactions1,2. Although metaproteomic analysis of clinical samples can uncover the microbiome's interactions with its host environment, the field still faces many challenges. One main challenge is the relatively high abundance of host (human) proteins, which hampers the identification of lower abundant microbial proteins. Moreover, MS-based metaproteomics d....
MS/MS spectral data were obtained from de-identified residual PTF samples that were collected using procedures that followed institutional board-approved guidelines and regulations, as previously described21,29,30.
NOTE: Figure 1 provides an overview of the complete workflow, which consists of five modules. All inputs, outputs, and software tools are summarized in
The general protocol described here was demonstrated on MS/MS files obtained from a subset of PTF samples21. Do et al.21 analyzed four MS/MS files from PTF samples that were collected following procedures described by Boylan et al.29and Afiuni-Zadel et al.30. This workflow prioritizes microbial proteins but offers the flexibility for the characterization of human proteins in parallel with microbial proteins21.......
Clinical metaproteomics research offers potential breakthroughs for clinical studies, but challenges in its implementation persist. The lower abundance of microbial proteins relative to the host proteins in most samples hinders the detection and characterization of non-host proteins6,10. Dependence on large protein sequence databases for accurate peptide and protein identification and quantification, along with complexities of taxonomically and functionally annot.......
We thank Dr. Amy Skubitz and Dr. Kristin Boylan (University of Minnesota) for the pilot data sets and Dr. Paul Piehowski, Dr. Tao Liu, and Dr. Karin Rodland (Pacific Northwest National Laboratories (PNNL)) for their expertise in the sample collection, and processing of the PTF samples and generation of the TMT-labeled MS data used in this study. This project was funded in part by the Minnesota Ovarian Cancer Alliance (MOCA), the National Institutes of Health/National Cancer Institute Grant Number: 5R01CA262153 (A.P.N.S.), 1R21CA267707 (P.D.J and T.J.G.), and the National Institutes of Health/National Cancer Institute Grant Number: P30CA077598 (P.D.J. and T.J.G.).
....Name | Company | Catalog Number | Comments |
Collapse Collection | GalaxyP | Galaxy Version 5.1.1 | Combines a dataset list collection into a single file (in the order of the list) |
Concatenate datasets | GalaxyP | Galaxy Version 0.1.1 | Concatenate files tail-to-head |
Cut | GalaxyP | Galaxy Version 1.0.2 | Cut (select) specified columns from a file |
FASTA Merge Files and Filter Unique Sequences | GalaxyP | Galaxy Version 1.2.0 | Concatenate FASTA database files together |
FastaCLI | GalaxyP | Galaxy Version 4.0.41+galaxy1 | Appends decoy sequences to FASTA files |
FASTA-to-Tablular | GalaxyP | Galaxy Version 1.1.0 | Convert FASTA-formatted sequences to TAB-delimited format |
Filter | GalaxyP | Galaxy Version 1.1.1 | Filter columns using simple expressions |
Filter Tabular | GalaxyP | Galaxy Version 3.3.0 | Filter a tabular file via line filters |
Galaxy Europe (EU) server | GalaxyP | https://usegalaxy.eu/ | |
Group | GalaxyP | Galaxy Version 2.1.4 | Group a file by a particular column and perform aggregate functions |
Identification Parameters | GalaxyP | Galaxy Version 4.0.41+galaxy1 | Set identification parameters for SearchGUI/PeptideShaker |
Learning Pathway: Clinical metaproteomics workflows within Galaxy | GalaxyP | https://training.galaxyproject.org/training-material/learning-pathways/clinical-metaproteomics.html | |
MaxQuant | GalaxyP | Galaxy Version 2.0.3.0+galaxy0 (Discovery module); Galaxy Version 1.6.17.0+galaxy4 (Quantification module) | Quantitative proteomics software package for analysis of large mass spectrometric data files |
MetaNovo | GalaxyP | Galaxy Version 1.9.4+galaxy4 | Search MS/MS data against a FASTA database (of known proteins) to produce a targeted database (of matched proteins) for mass spectrometry analysis |
msconvert | GalaxyP | Galaxy Version 3.0.20287.2 | Convert and/or filter mass spectrometry files |
MSstatsTMT | GalaxyP | Galaxy Version 2.0.0+galaxy1 | R-based package for detection of differentially abundant proteins in shotgun mass spectrometry-based proteomic experiments using tandem mass tag (TMT) labeling |
PepQuery2 | GalaxyP | Galaxy Version 2.0.2+galaxy0 | Peptide-centric search engine for identification and/or validating known and novel peptides of interest |
PeptideShaker | GalaxyP | Galaxy Version 2.0.33+galaxy1 | Interpret results from SearchGUI for protein identification |
Protein Database Downloader | GalaxyP | Galaxy Version 0.3.4 | Download specified protein sequences as a FASTA file |
Query Tabular | GalaxyP | Galaxy Version 3.3.0 | Load tabular files intoa SQLite database |
Remove beginning | GalaxyP | Galaxy Version 1.0.0 | Remove the specified number of (header) lines from a file |
SearchGUI | GalaxyP | Galaxy Version 4.0.41+galaxy1 | Run search engines on MGF peak lists and prepare results for input to Peptide Shaker |
Select | GalaxyP | Galaxy Version 1.0.4 | Select lines that match an expression |
Unipept | GalaxyP | Galaxy Version 4.5.1 | Retrieve UniProt entries and taxonomic information for tryptic peptides |
UniProt | GalaxyP | Galaxy Version 2.3.0 | Download proteome as a XML (UniProtXML) or FASTA file from UniProtKB |
Explore More Articles
This article has been published
Video Coming Soon
ABOUT JoVE
Copyright © 2025 MyJoVE Corporation. All rights reserved