The overall goal of this software protocol is to demonstrate the use of PoGo to map peptides with associated post-translational modifications and quantitative information from mass spectrometry experiments to reference genomes and enable integration and visualization with orthogonal genomics data. This tool can help add genomic information to the peptide lists generated by mass spectrometry proteomics searches so that integration with RNA sequencing data may be facilitated. The main advantage of this tool is that it is simple to use, fast and supports additional information such as post-translational modifications, quantitation and amino acid variance.
To begin, download and install the software demonstrated in this video as described in the accompanying text protocol. Once installed, navigate to the PoGo GUI executables folder and start the program by double clicking the PoGo GUI icon. Use the select button next to the PoGo executable.
Then navigate in the executables folder to the relevant operating system subfolder and select the executable PoGo file. Confirm its selection by clicking the open button. Next, select the reference input file for protein sequences by clicking select.
Navigate to the data folder and select the translation fast day file. Confirm its selection by clicking open. Select the transcript annotation file using the select button.
Then navigate to the data folder, select the annotation GTF file and click on open. Add the peptide identification file using the add button next to peptide files. Here, select the file in the supported format MZ Tab, MZ Ident ML, or MZ ID, or in the tab separated four-column format.
Additionally, untick the checkboxes next to BED NGTF in the output format selection. Only leave PTM BED and GCT checked. Then select the appropriate species for the data from the dropdown selection and start mapping by clicking start.
First, load the PoGo output file ending in _ptm. bed into the integrative genomics viewer. Repeat to also load the file ending in _noptm.bed.
This file contains all peptides found without any modification. Reorder tracks by dragging and dropping them to the desired position in the list. Each file will be shown as separate tracks with the file's name identifying the track.
Each track is initially shown in a collapsed manner. To expand them, right click on the track name and select expanded for a full view of the peptides including the sequences. Now load the file ending in gct.
This file contains the peptide quantitation information. Unlike for the files previously loaded, each annotated sample will be loaded as a separate track. Reorganize the samples as desired using drag and drop operations.
Navigate within the genome by selecting a chromosome in the dropdown menu by clicking and holding a section of a chromosome to zoom in on or by typing in the genomic coordinates. PoGo mapping can be carried out using the graphical user interface or through command line interface. In this part of the protocol, the command line interface is used to highlight interchangeability.
In order to map the variant peptides obtained through the previous steps in the text protocol to the reference genome, open a command prompt and navigate to the executables folder of PoGo. Then type the command shown here. Confirm the execution by pressing the enter key and wait until the execution is finished before progressing any further.
Visualize the peptides mapped without mismatch and with mismatch in the integrative genomics viewer. In the PoGo graphical user interface software, add the peptide identification files by using the add button next to peptide files and select single or multiple files as well as drag and drop into the blank field underneath peptide files. Then uncheck the checkboxes next to PTM BED, GTF, and GCT in the output format section and only leave BED checked.
Additionally, select the option merge multiple input files into single output. This will result in a single output file combining all peptides of the input files. Leaving this option unselected will result in a sequential execution of the program for each input file separately.
Next, select the appropriate species for the data from the dropdown that is consistent with the fast day NGTF files. Once selected, start the mapping by clicking the start button. In order to generate a track hub from mapped peptides, open a terminal window and type the following command into the command prompt.
Confirm the execution by pressing the enter key. The execution will only take a short time to finish. Once finished, transfer the generated track hub with all its contents to a web accessible FTP server or GitHub.
In a web browser, navigate to the UCSC Genome browser and select My Data Track Hubs. Then click on the tab My Hubs. Copy the URL to the track hub into the text field and then load the track hub by clicking add hub.
Finally, select the genome browser to enter the browser view. The custom track hub will be shown at the top of the list. If multiple BED files built the basis for the track hub, each of the files will be represented as a separate track within the hub.
PoGo and two other proteogenomics mapping tools are compared here for their overall memory use and analytical runtime. PoGo significantly outperform the other tools in both speed and memory and it is capable of mapping post-translational modifications and quantitative information associated with peptides onto the genome. PoGo utilizes the coloring option to provide easy visual aid with respect to the uniqueness of peptide mapping within the genome.
Mappings in red indicate uniqueness to a single transcript while block highlights mapping to a single gene. Gray mappings show peptides shared between multiple genes. The PTM BED option of PoGo redefines the color code to accommodate different types of post-translational modifications indicating the location of the modification with a thick block at the modified amino acid residue.
In the proteome data shown here representing that of colorectal cancer, only two peptides had been identified with each showing a single phosphorylation. The information in the output GCT file provides quantitative values for genomic regions covered by peptides. Peptides spanning across splice junctions are divided into their parts without connecting.
For more comprehensive information about splicing, the output GTF file is additionally required. Once the reference protein sequences and annotation is downloaded and setup, thousands of peptides can be mapped onto genomic coordinates in under five minutes when not allowing sequence mismatches. While attempting to map peptides onto reference genomes using PoGo, it is important to remember to use translated sequences of protein coding transcripts and associated transcript annotation.
After watching this video, you should have a good understanding of how to use PoGo and PoGo GUI to map peptides from proteomics mass spectrometry experiments with associated information onto referenced genomes and visualize them.