NOTE: In this protocol, the usage of JUMPn is illustrated by utilizing a published dataset of whole proteome profiling during B cell differentiation quantified by TMT isobaric label reagent27.
1. Setup of JUMPn software
NOTE: Two options are provided for setting up the JUMPn software: (i) installation on a local computer for personal use; and (ii) deployment of JUMPn on a remote Shiny Server for multiple users. For local installation, a personal computer with Internet access and ≥4 Gb of RAM is sufficient to run JUMPn analysis for a dataset with a small sample size (n < 30); larger RAM (e.g., 16 Gb) is needed for large-cohort analysis (e.g., n = 200 samples).
- Install the software on a local computer. After installation, allow the web browser to launch JUMPn and let the analysis run on the local computer.
- Install anaconda42 or miniconda43 following the online instructions.
- Download the JUMPn source code41. Double click to unzip the downloaded file JUMPn_v_1.0.0.zip; a new folder named JUMPn_v_1.0.0 will be created.
- Open command line terminal. On Windows, use the Anaconda Prompt. On MacOS, use the built-in Terminal application.
- Create the JUMPn Conda environment: Get the absolute path of JUMPn_v_1.0.0 folder (e.g., /path/to/JUMPn_v_1.0.0). To create and activate an empty Conda environment type the following commands on the terminal
conda create -p /path/to/JUMPn_v_1.0.0/JUMPn -y
conda activate /path/to/JUMPn_v_1.0.0/JUMPn
- Install JUMPn dependencies: Install R (on the terminal, type conda install -c conda-forge r=4.0.0 -y), change the current directory to the JUMPn_v_1.0.0 folder (on the terminal, type cd path/to/JUMPn_v_1.0.0), and install the dependency packages (on the terminal, type Rscript bootstrap.R)
- Launch JUMPn on the web browser: Change the current directory to the execution folder (on the terminal, type cd execution) and launch JUMPn (on the terminal, type R -e "shiny::runApp()")
- Once the above is executed, the terminal screen will show up Listening on http://127.0.0.1:XXXX (here XXXX indicates 4 random numbers). Copy and paste http://127.0.0.1:XXXX onto the web browser, on which JUMPn welcome page will show up (Figure 2).
- Deployment on Shiny Server. Examples of Shiny Server include the commercial shinyapps.io server or any institutionally supported Shiny Servers.
- Download and install RStudio following the instruction44.
- Obtain the deployment permission for the Shiny Server. For the shinyapps.io server, set up the user account by following the instruction45. For institutional Shiny server, contact the server administrator for requesting permissions.
- Download the JUMPn source code41 to the local machine; installation is not necessary. Open either the server.R or ui.R files in RStudio and click the Publish to Server drop-down menu in the top right of the RStudio IDE.
- In the Publish to Account panel, type the server address. Press the Publish button. Successful deployment is validated upon automatic redirect from RStudio to the RShiny server where the application was deployed.
2. Demo run using an example dataset
NOTE: JUMPn offers a demo run using the published B cell proteomics dataset. The demo run illustrates a streamlined workflow that takes the quantification matrix of differentially expressed proteins as input and performs co-expression clustering, pathway enrichment, and PPI network analysis sequentially.
- On the JUMPn home page (Figure 2), click on the Commence Analysis button to start JUMPn analysis.
- In the bottom left corner of the Commence Analysis page (Figure 3), click on the Upload Demo B Cell Proteomic Data button; a dialog box will appear notifying the success of the data upload.
- In the bottom right corner of the page, click on the Submit JUMPn Analysis button to initiate the demo run using default parameters; a progress bar will appear that denotes the course of the analysis. Wait until the progress bar is fulfilled (3 min expected).
- Once the demo run is finished, a dialog box will appear with the success run message and the absolute path to the result folder. Click on Continue to Results to continue.
- The webpage will first guide the user to the co-expression cluster results by WGCNA. Click on View Results on the dialog window to continue.
- Find the protein co-expression patterns on the left of the Result Page 1: WGCNA Output page. Click on the Select the Expression Format drop-down box to navigate between two figure formats:
- Select Trends to display the trends plot, with each line representing individual protein abundance across samples. The color of each line represents how close the expression pattern is to the co-expression cluster consensus (i.e., "eigengene" as defined by the WGCNA algorithm).
- Select Boxplot to display co-expression patterns in boxplot format for each sample.
- View the pathway/ontology enrichment heatmap on the right of the WGCNA output page. The most highly enriched pathways for each cluster are displayed together in a heatmap, with the color intensity reflecting the Benjamini-Hochberg adjusted p-value.
- Scroll down the webpage to view the expression pattern for individual proteins.
- Use the drop-down box Select the Co-Expression Cluster to view proteins from each cluster (default is Cluster 1). Select a specific protein in the table, upon which the bar plot below the table will be automatically updated to reflect its protein abundance.
- Search specific protein names using the Search box on the right side of the table for a specific protein.
- To view PPI results, click on the Results Page 2: PPI Output on the top.
- Click on Select the Co-Expression Cluster to view the results for a specific co-expression cluster (default is cluster 1). The displays of all figure panels on this page will be updated for the newly selected cluster.
- View the PPI networks for the selected co-expression cluster on the left figure panel:
- Click the Select by Group drop-down box to highlight individual PPI modules within the network. Click on the Select a Network Layout Format drop-down box to change the network layout (default is by Fruchterman Reingold).
- Use the mouse and the trackpad to perform steps 2.11.3-2.11.5.
- Zoom in or zoom out the PPI network as needed. The gene names of each node in the network will be shown when zoomed in sufficiently.
- When zoomed in, select and click a certain protein to highlight that protein and its network neighbors.
- Drag a certain node (protein) in the network to change its position in the layout; thereby the network layout can be re-organized by the user.
- On the right panel of the PPI result page, view the co-expression cluster-level information that assists interpretation of PPI results:
- View the co-expression pattern of the selected cluster as boxplot by default.
- Click on the Select the Expression Format drop-down box for more information or displays as mentioned in steps 2.12.3-2.12.5.
- Select Trends to show trends plot for the co-expression pattern.
- Select Pathway Barplot to show significantly enriched pathways for the co-expression cluster.
- Select Pathway Circle Plot to show significantly enriched pathways for the co-expression cluster in the circle plot format.
- Scroll down the Result Page 2: PPI Output webpage to view results on the individual PPI module level. Click on the Select the Module drop-down box to select a specific PPI module for display (Cluster1: Module 1 is shown by default).
- View the PPI module on the left panel. To manipulate the network display, follow steps 2.11.2-2.11.5.
- View the pathway/ontology enrichment results on the right panel. Click on the Select the Pathway Annotation Style drop-down box for more information and displays:
- Select Barplot to show significantly enriched pathways for the selected PPI module.
- Select Circle Plot to show significantly enriched pathways for the selected PPI module in the format of a circle plot.
- Select Heatmap to show significantly enriched pathways and the associated gene names from the selected PPI module.
- Select Table to show the detailed pathway enrichment results, including the name of pathways/ontology terms, gene names, and the P-value by Fisher's exact test.
- View the publication table in a spreadsheet format: follow the absolute path (printed on the top of both results pages) and find the publication spreadsheet table named ComprehensiveSummaryTables.xlsx.
3. Preparation of the input file and upload to JUMPn
NOTE: JUMPn takes as input the quantification matrix of either the differentially expressed proteins (supervised method) or the most variable proteins (unsupervised method). If the goal of the project is to understand proteins changed across multiple conditions (e.g., different disease groups, or time-series analysis of biological process), the supervised method of performing DE analysis is preferred; otherwise, an unsupervised approach of selecting the most variable proteins may be used for the exploratory purpose.
- Generate the protein quantification table, with each protein as rows and each sample as columns. Achieve this via modern mass spectrometry-based proteomics software suite (e.g., JUMP suite13,14,39, Proteome Discoverer, Maxquant15,46).
- Define the variable proteome.
- Use the statistical analysis results provided by the proteomics software suite to define differentially expressed (DE) proteins (for example, with adjusted p-value < 0.05).
- Alternatively, users may follow the example R code47 to define either DE or most variable proteins.
- Format the input file using the defined variable proteome.
NOTE: The required input file format (Figure 4) includes a header row; the columns include protein accession (or any unique IDs), GN (official gene symbols), protein description (or any user-provided information), followed by protein quantification of individual samples.
- Follow the order of the columns specified in step 3.1, but the column names of the header are flexible to the user.
- For TMT (or similar) quantified proteome, use the summarized TMT reporter intensity as input quantification values. For label-free data, use either normalized spectral counts (e.g., NSAF48) or intensity-based method (e.g., LFQ intensity or iBAQ protein intensity reported by Maxquant46).
- Missing values are allowed for JUMPn analysis. Ensure to label these as NA in the quantification matrix. However, it is recommended to only use proteins with quantification in more than 50% of the samples.
- Save the resulting input file as .txt, .xlsx, or .csv format (all three are supported by JUMPn).
- Upload input file:
- Click the Browser button and select the input file (Figure 3, left panel); the file format (xlsx, csv, and txt are supported) will be automatically detected.
- If the input file contains intensity-like quantification values (e.g., those generated by JUMP suite39) or ratio-like (e.g., from Proteome Discoverer), select Yes for the Execute Log2-Transformation of Data Option; otherwise, the data may have already been log-transformed, so select No for this option.
4. Co-expression clustering analysis
NOTE: Our group25,26,27 and others28,29,31 have proved WGCNA49 an effective method for co-expression clustering analysis of quantitative proteomics. JUMPn follows a 3-step procedure for WGCNA analysis25,50: (i) initial definition of co-expression gene/protein clusters by dynamic tree cutting51 based on the topological overlap matrix (TOM; determined by quantification similarities among genes/proteins); (ii) merging of similar clusters to reduce redundancy (based on dendrogram of eigengene similarities); and (iii) final assignment of genes/proteins to each cluster that exceed the minimal Pearson correlation cutoff.
- Configure the WGCNA parameters (Figure 3, middle panel). The following three parameters control the three steps, respectively:
- Set minimum cluster size as 30. This parameter defines the minimal number of proteins required for each co-expression cluster in the initial step (i) of TOM-based hybrid dynamic tree cutting. The larger the value, the smaller the number of clusters returned by the algorithm.
- Set minimum cluster distance as 0.2. Increasing this value (e.g., from 0.2-0.3) may cause more cluster merging during step (ii), thus resulting in a fewer number of clusters.
- Set minimum kME as 0.7. Proteins will be assigned to the most correlated cluster defined in step (ii), but only proteins with Pearson correlation passing this threshold will be retained. Proteins that fail in this step will not be assigned to any cluster ('NA' cluster for the failed proteins in the final report).
- Initiate the analysis. There are two ways to submit the co-expression clustering analysis:
- Click on the Submit JUMPn Analysis button in the bottom right corner to initiate the comprehensive analysis of WGCNA automatically followed by PPI network analysis.
- Alternatively, select to execute the WGCNA step only (especially for the purpose of parameter tuning; see steps 4.2.3-4.2.4):
- Click on the Advanced Parameters button at the bottom of the Commence Analysis page; a new parameter window will pop up. In the bottom widget, Select Mode of Analysis, select WGCNA Only, then click on Dismiss to continue.
- On the Commence Analysis page, click on the Submit JUMPn Analysis button.
- In either case above, a progress bar will appear upon analysis submission.
NOTE: Once the analysis is finished (typically < 1 min for WGCNA Only analysis and <3 min for comprehensive analysis), a dialog box will appear with a success run message and the absolute path to the result folder.
- Examine the WGCNA results as illustrated in steps 2.4-2.8 (Figure 5). Note that the absolute path to the file co_exp_clusters_3colums.txt is highlighted on the top of the Results Page: WGCNA Output to record the cluster membership of each protein and use it as input for the PPI Only analysis.
- Troubleshooting. The following three common cases are discussed. Once the parameters are updated as discussed below, follow steps 4.2.2-4.2.4 to generate new WGCNA results.
- If one important co-expression pattern is expected from the data but missed by the algorithm, follow steps 4.4.2-4.4.4
- A missing cluster is especially likely for small co-expression clusters, i.e., only a limited number (e.g., <30) of proteins exhibiting this pattern. Before the re-analysis, re-examine the input file of protein quantification matrix and locate several positive control proteins that adhere to that important co-expression pattern.
- To rescue the small clusters, decrease the Minimal Cluster Size (e.g., 10; cluster size less than 10 may not be robust thus not recommended), and decrease the Minimal Cluster Distance (e.g., 0.1; here setting as 0 is also allowed, which means automatic cluster merging will be skipped).
- After executing the co-expression clustering step with the updated parameters, first, check if the cluster is rescued from the Co-Expression Pattern Plots, then check the positive controls by searching their protein accessions from Detailed Protein Quantification (make sure to select the appropriate co-expression cluster from the left side drop-down widget before the search).
NOTE: Multiple iterations of parameter tuning and rerun may be needed for the rescue.
- If there are too many proteins that cannot be assigned to any cluster, follow steps 4.4.6-4.4.7.
NOTE: Usually, a small percentage (typically <10%) of proteins may not be assigned to any cluster as those may be outlier proteins that did not follow any of the common expression patterns of the dataset. However, if such percentage is significant (e.g., >30%), it suggests that there exist additional co-expression patterns that cannot be ignored.
- Decrease both the Minimal Cluster Size and Minimal Cluster Distance parameters to alleviate this situation by detecting 'new' co-expression clusters.
- In addition, decrease the Minimal Pearson Correlation (kME) parameter to shrink these 'NA cluster' proteins.
NOTE: Tuning this parameter will not generate new clusters but instead will increase the size of 'existing' clusters by accepting more previously failed proteins with the lower threshold; however, this will also increase the heterogeneity of each cluster, as more noisy proteins are now allowed.
- Two clusters have a very minor difference of patterns; merge them into one cluster following steps 4.4.9-4.4.11.
- Increase the Minimal Cluster Distance parameter to solve the issue.
- However, in some situations, the algorithm may never return the desired pattern; in such an instant, manually adjust or edit cluster membership in the file co_exp_clusters_3colums.txt (file from step 4.3) to merge.
- Take the post-edited file as input for the downstream PPI network analysis. In case of manual editing, justify the criteria of cluster assignment, and record the procedure of manual editing.
5. Protein-protein interaction network analysis
NOTE: By superimposing co-expression clusters onto the PPI network, each co-expression cluster is further stratified into smaller PPI modules. The analysis is performed for each co-expression cluster and includes two stages: in the first stage, JUMPn superimposes proteins from the co-expression cluster onto the PPI network and find all connected components (i.e., multiple clusters of connected nodes/proteins; as an example, see Figure 6A); then, communities or modules (of densely connected nodes) will be detected for each connected component iteratively using the topological overlap matrix (TOM) method52.
- Configure parameters for PPI network analysis (Figure 3, right panel).
- Set Minimal PPI Module Size as 2. This parameter defines the minimal size of the disconnected components from the first stage analysis. Any component smaller than the specified parameter will be removed from the final results.
- Set Maximal PPI Module Size as 40. Large, disconnected components that pass this threshold will undergo second stage TOM-based analysis. The second stage analysis will further split each large component into smaller modules: each module presumably contains proteins more densely connected than the original component as a whole.
- Initiate the analysis. There are two ways to submit the PPI network analysis:
- Hit the Submit JUMPn Analysis button to automatically perform the PPI analysis following WGCNA analysis by default.
- Alternatively, upload customized co-expression cluster results and perform PPI Only analysis following steps 5.2.3-5.2.5.
- Prepare input file by following the format of the file co_exp_clusters_3colums.txt (see subsection 4.4).
- Click on the Advanced Parameters button at the bottom of the Commence Analysis page; a new parameter window will pop up. In the upper session Upload Co-Expression Cluster Result for 'PPI Only' Analysis, click on Browser to upload the input file prepared by step 5.2.3.
- In the bottom widget, Select Mode of Analysis, select PPI only, then click on Dismiss to continue. On the Commence Analysis page, click on the Submit JUMPn Analysis button.
- Once the analysis is finished (typically <3 min), examine the PPI results as illustrated in steps 2.10-2.15 (Figure 6).
- Optional advanced step) Adjust PPI modularization by tuning parameters:
- Increase the Maximal Module Size parameter to allow more proteins included in the PPI results. Upload customized PPI network to cover undocumented interactions, following steps 5.4.2-5.4.3.
- Click on the Advanced Parameters button at the bottom of the Commence Analysis page; a new parameter window will pop up. Prepare the customized PPI file, which contains three columns in the format of <Protein_A>, C onnection, and <Protein_B>; here <Protein_X> are presented by the official gene names of each protein.
- In Upload a PPI Database, click on the Browse button to upload the customized PPI file.
6. Pathway enrichment analysis
NOTE: The JUMPn-derived hierarchical structures of both co-expression clusters and PPI modules within are automatically annotated with over-represented pathways using Fisher's exact test. The pathway/topology databases used include Gene Ontology (GO), KEGG, Hallmark, and Reactome. Users may use advanced options to upload customized databases for the analysis (e.g., in the case of analyzing data from non-human species).
- By default, the pathway enrichment analysis is initiated automatically with co-expression clustering and PPI network analysis.
- View the pathway enrichment results:
- Follow steps 2.7, 2.12, and 2.15 to visualize different formats on the result pages. View detailed results in spreadsheet publication table in the ComprehensiveSummaryTables.xlsx file (step 2.16).
- (Optional advanced step) Upload customized database for pathway enrichment analysis:
- Prepare the gene background file, which typically contains the official gene names of all genes of a species.
- Prepare the ontology library file following steps 6.3.3-6.3.4.
- Download the ontology library files from public websites including EnrichR53, and MSigDB54. For example, download ontology from Drosophila from the EnrichR website55.
- Edit the downloaded file for the required format with two columns: the pathway name as the first column, and then the official gene symbols (separated by "/") as the second column. The detailed file format is described in the Help page of the JUMPn R shiny software.
NOTE: Find example files of gene background and ontology library (using Drosophila as an instance) in the JUMPn GitHub site56.
- Click on the Advanced Parameters button at the bottom of the Commence Analysis page; a new parameter window will pop up.
- Find Upload a Background File for Pathway Enrichment Analysis item and click on Browser to upload the background file prepared at step 6.3.1. Then in the session, Select The Background to be Used for Pathway Enrichment Analysis, click on User-Supplied Background.
- Find Upload an Ontology Library File for Pathway Enrichment Analysis item and click on Browser to upload the ontology library file prepared at steps 6.3.2-6.3.4. Then in the session, Select Databases for Pathway Enrichment Analysis, click on User-Supplied Database in .xlsx Format.
- Click on the Submit JUMPn Analysis button in the bottom right corner to initiate the analysis using the customized database.
7. Analysis of dataset with large sample size
NOTE: JUMPn supports analysis of dataset with large sample size (up to 200 samples tested). To facilitate the visualization of a large sample size, an additional file (named "meta file") that specifies the sample group is needed to facilitate the display of co-expression clustering results.
- Prepare and upload meta file.
- Prepare the meta file that specifies group information (e.g., control and disease groups) for each sample following steps 7.1.2-7.1.3.
- Ensure that the meta file contains at least two columns: column 1 must contain the sample names identical to the column names and order from the protein quantification matrix file (as prepared in step 3.3); Column 2 onwards will be used for group assignment for any number of features defined by the user. The number of columns is flexible.
- Ensure that the first row of the meta file contains the column names for each column; from the second row onwards, individual sample information of groups or other features (e.g., sex, age, treatment, etc.) should be listed.
- Upload the meta file by clicking on the Advanced Parameters button in the bottom of the Commence Analysis page; a new parameter window will pop up. Proceed to step 7.1.5
- Find Upload a Meta File item and click on Browser to upload the background file. If the unexpected format or unmatched sample names are detected by JUMPn, an error message will pop up for further formatting of the meta file (steps 7.1.1-7.1.3).
- Adjust the parameters for co-expression clustering analysis: set Minimal Pearson Correlation as 0.2. This parameter needs to be relaxed due to larger sample size.
- Click on Submit JUMPn Analysis button in the bottom right corner to submit the analysis.
- View analysis results: all the data output is the same except for displaying the co-expression cluster patterns.
- In the Results Page 1: WGCNA Output page, visualize the co-expression clusters as boxplots with samples stratified by the user-defined sample groups or features. Each dot in the plot represents the eigengene (i.e., the consensus pattern of the cluster) calculated by the WGCNA algorithm.
- If the user provided multiple features (e.g., age, sex, treatment, etc.) to group the samples, click on the Select the Expression Format drop-down box to select another feature for grouping the samples.