High-Throughput Transcriptome Analysis for Investigating Host-Pathogen Interactions

André Nicolau Aquime Gonçalves; Vanessa Escolano Maso; Ícaro Maia Santos de Castro; Amanda Pereira Vasconcelos; Rodrigo Luiz Tomio Ogava; Helder I Nakaya

doi:10.3791/62324

A subscription to JoVE is required to view this content. Sign in or start your free trial.

Summary

The protocol presented here describes a complete pipeline to analyze RNA-sequencing transcriptome data from raw reads to functional analysis, including quality control and preprocessing steps to advanced statistical analytical approaches.

Abstract

Pathogens can cause a wide variety of infectious diseases. The biological processes induced by the host in response to infection determine the severity of the disease. To study such processes, researchers can use high-throughput sequencing techniques (RNA-seq) that measure the dynamic changes of the host transcriptome at different stages of infection, clinical outcomes, or disease severity.This investigation can lead to a better understanding of the diseases, as well as uncovering potential drug targets and treatments. The protocol presented here describes a complete pipeline to analyze RNA-sequencing data from raw reads to functional analysis. The pipeline is divided into five steps: (1) quality control of the data; (2) mapping and annotation of genes; (3) statistical analysis to identify differentially expressed genes and co-expressed genes; (4) determination of the molecular degree of the perturbation of samples; and (5) functional analysis. Step 1 removes technical artifacts that may impact the quality of downstream analyses. In step 2, genes are mapped and annotated according to standard library protocols. The statistical analysis in step 3 identifies genes that are differentially expressed or co-expressed in infected samples, in comparison with non-infected ones. Sample variability and the presence of potential biological outliers are verified using the molecular degree of perturbation approach in step 4. Finally, the functional analysis in step 5 reveals the pathways associated with the disease phenotype. The presented pipeline aims to support researchers through the RNA-seq data analysis from host-pathogen interaction studies and drive future in vitro or in vivo experiments, that are essential to understand the molecular mechanism of infections.

Introduction

Arboviruses, such as dengue, yellow fever, chikungunya, and zika, have been widely associated with several endemic outbreaks and have emerged as one of the main pathogens responsible for infecting humans in the last decades¹^,². Individuals infected with the chikungunya virus (CHIKV) often have fever, headache, rash, polyarthralgia, and arthritis³^,⁴^,⁵. Viruses can subvert the gene expression of the cell and influence various host signaling pathways. Recently, blood transcriptome studies utilized RNA-seq to identify the differentially expressed genes (DEGs) associated with acute CHIKV infection in comparison with convalescence⁶ or healthy controls⁷. CHIKV-infected children had up-regulated genes that are involved in innate immunity, such as the ones related to cellular sensors for viral RNA, JAK/STAT signaling, and toll-like receptor signaling pathways⁶. Adults acutely infected with CHIKV also showed induction of genes related to innate immunity, such as the ones related to monocytes and dendritic cell activation, and to antiviral responses⁷. The signaling pathways enriched with down-regulated genes included the ones related to adaptive immunity, such as T cell activation and differentiation and enrichment in T and B cells⁷.

Several methods can be used to analyze transcriptome data of host and pathogen genes. Often, RNA-seq library preparation starts with the enrichment of mature poly-A transcripts. This step removes most of the ribosomal RNA (rRNA) and in some of the cases viral/bacterial RNAs. However, when the biological question involves the pathogen transcript detection and RNA are sequenced independent of the previous selection, many other different transcripts could be detected by sequencing. For example, subgenomic mRNAs have been shown to be an important factor to verify the severity of the diseases⁸. In addition, for certain viruses such as CHIKV and SARS-CoV-2, even poly-A enriched libraries generate viral reads that can be utilized in downstream analyses⁹^,¹⁰. When focused on the analysis of the host transcriptome, researchers can investigate the biological perturbation across samples, identify differentially expressed genes and enriched pathways, and generate co-expression modules⁷^,¹¹^,¹². This protocol highlights transcriptome analyses of CHIKV-infected patients and healthy individuals using different bioinformatic approaches (Figure 1A). Data from a previously published study⁷ consisting of 20 healthy and 39 CHIKV acutely infected individuals were used to generate the representative results.

Protocol

The samples used in this protocol were approved by the ethics committees from both the Department of Microbiology of the Institute of Biomedical Sciences at the University of São Paulo and the Federal University of Sergipe (Protocols: 54937216.5.0000.5467 and 54835916.2.0000.5546, respectively).

1. Docker desktop installation

NOTE: Steps to prepare the Docker environment are different among the operating systems (OSs). Therefore, Mac users must follow steps listed as 1.1, Linux users must follow steps listed as 1.2, and Windows users must follow steps listed as 1.3.

Install on the MacOS.
1. Access the Get Docker website (Table of Materials), click on Docker Desktop for Mac and then click on the Download from Docker Hub link.
2. Download the installation file by clicking on the Get Docker button.
3. Execute the Docker.dmg file to open the installer, and then drag the icon to the Applications folder. Localize and execute the Docker.app in the Applications folder to start the program.
  NOTE: The software specific menu in the top status bar indicates that the software is running and that it is accessible from a terminal.
Install the container program on the Linux OS.
1. Access the Get Docker Linux website (Table of Materials) and follow the instructions for installing using the repository section available on the Docker Linux Repository link.
2. Update all Linux packages using the command line:
  sudo apt-get update
3. Install the required packages to Docker:
  sudo apt-get install apt-transport-https ca-certificates curl gnupg lsb-release
4. Create a software archive keyring file:
  curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
5. Add Docker deb information in the source.list file:
  echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
6. Update all the packages again, including the ones recently added:
  sudo apt-get update
7. Install the desktop version:
  sudo apt-get install docker-ce docker-ce-cli containerd.io
8. Select the geographic area and time zone to finish the installation process.
Install the container program on the Windows OS.
1. Access the Get Docker website (Table of Materials) and click on Get Started. Find the installer for Docker Desktop for Windows. Download the files and install them locally on the computer.
2. After the download, start the installation file (.exe) and keep the default parameters. Make sure that the two options Install Required Windows Components for WSL 2 and Add Shortcut to Desktop are marked.
  NOTE: In some cases, when this software tries to start the service, it shows an error: WSL installation is incomplete. To figure out this error, access the website WSL2-Kernel (Table of Materials).
3. Download and install the latest WSL2 Linux kernel.
4. Access PowerShell terminal as Administrator and execute the command:
  dism.exe /online /enable-feature /featurename:Microsoft-Windows-Subsystem-Linux /all /norestart
5. Ensure that the software Docker Desktop is installed successfully.
Download the image from CSBL repository on the Docker hub (Table of Materials).
1. Open the Docker Desktop and verify that the status is "running" at the bottom left of the toolbar.
2. Go to the Windows PowerShell terminal command line. Download the Linux Container image for this protocol from the CSBL repository on the Docker hub. Execute the following command to download the image:
  docker pull csblusp/transcriptome
  NOTE: After downloading the image, the file can be seen in the Docker Desktop. To create the container, Windows users must follow step 1.5, while Linux users must follow step 1.6.
Initialize the server container on the Windows OS.
1. View the Docker image file in the Desktop App manager from the Toolbar and access the Images page.
  NOTE: If the pipeline image was downloaded successfully, there will be a csblusp/transcriptome image available.
2. Initiate the container from the csblusp/transcriptome image by clicking on the Run button. Expand the Optional Settings to configure the container.
3. Define the Container Name (e.g., server).
4. Associate a folder in the local computer with the folder inside the docker. To do this, determine the Host Path. Set a folder in the local machine to store the processed data that will be downloaded at the end. Set the Container Path. Define and link the csblusp/transcriptome container folder to the local machine path (use the name "/opt/transferdata" for the Container Path).
5. After this, click on Run to create the csblusp/transcriptome container.
6. To access the Linux terminal from the csblusp/transcriptome container, click on the CLI button.
7. Type in the bash terminal to have a better experience. For this, execute the command:
  bash
8. After executing the bash command, ensure that the terminal shows (root@<containerID>:/#):
  root@ac12c583b731:/#
Initialize the server container for Linux OS.
1. Execute this command to create the Docker container based on the image:
  docker run -d -it --rm --name server -v <Host Path>:/opt/transferdata csblusp/transcriptome
  NOTE: <host path>: define a path of the local folder machine.
2. Execute this command to access the command terminal of the Docker container:
  docker exec -it server bash
3. Ensure availability of a Linux terminal to execute any programs/scripts using the command line.
4. After executing the bash command, ensure that the terminal shows (root@<containerID>:/#):
  root@ac12c583b731:/#
  NOTE: The root password is "transcriptome" by default. If desired, the root password can be changed by executing the command:
  passwd
5. First, execute the source command to addpath.sh to ensure all tools are available. Execute the command:
  source /opt/addpath.sh
Check the structure of the RNA sequencing folder.
1. Access the transcriptome pipeline scripts folder and ensure all data from RNA sequencing are stored inside the folder: /home/transcriptome-pipeline/data.
2. Ensure all the results obtained from the analysis are stored inside the folder of the path /home/transcriptome-pipeline/results.
3. Ensure genome and annotation reference files are stored inside the folder of the path /home/transcriptome-pipeline/datasets. These files will help to support all analysis.
4. Ensure all scripts are stored in the folder of the path /home/transcriptome-pipeline/scripts and separated by each step as described below.
Download the annotation and the human genome.
1. Access the scripts folder:
  cd /home/transcriptome-pipeline/scripts
2. Execute this command to download the reference human genome:
  bash downloadGenome.sh
3. To download the annotation, execute the command:
  bash downloadAnnotation.sh
Change the annotation or the version of the reference genome.
1. Open downloadAnnotation.sh and downloadGenome.sh to change the URL of each file.
2. Copy the downloadAnnotation.sh and downloadGenome.sh files to the transfer area and edit in the local OS.
  cd /home/transcriptome-pipeline/scripts
  cp downloadAnnotation.sh downloadGenome.sh /opt/transferdata
3. Open the Host Path folder, which is selected to link between host and Docker container in step 1.5.4.
4. Edit the files using the preferred editor software and save. Finally, put the modified files into the script folder. Execute the command:
  cd /opt/transferdata
  cp downloadAnnotation.sh downloadGenome.sh /home/transcriptome-pipeline/scripts
  NOTE: These files can be edited directly using vim or nano Linux editor.
Next, configure the fastq-dump tool with the command line:
vdb-config --interactive
NOTE: This allows to download sequencing files from the example data.
1. Navigate the Tools page using the tab key and select the current folder option. Navigate to the Save option and click on OK. Then, Exit the fastq-dump tool.
Initiate the download of the reads from the previously published paper⁷. The SRA accession number of each sample is required. Obtain the SRA numbers from the SRA NCBI website (Table of Materials).
NOTE: To analyze RNA-Seq data available on public databases, follow step 1.12. To analyze private RNA-seq data, follow step 1.13.
Analyze specific public data.
1. Access the National Center for Biotechnology Information (NCBI) website and seek keywords for a specific subject.
2. Click on the Result link for BioProject in the Genomes section.
3. Choose and click on a specific study. Click on the SRA Experiments. A new page opens, which shows all the samples available for this study.
4. Click on the "Send to:" above accession number. In the "Choose Destination" option select File and Format option, select RunInfo. Click on "Create File" to export all library information.
5. Save the SraRunInfo.csv file in the Host path defined in the 1.5.4 step and execute the download script:
  cp /opt/transferdata/SraRunInfo.csv /home/transcriptome-pipeline/data
  cd /home/transcriptome-pipeline/scripts
  bash downloadAllLibraries.sh
Analyze private and unpublished sequencing data.
1. Organize the sequencing data in a folder named Reads.
  NOTE: Inside the Reads folder, create one folder for each sample. These folders must have the same name for each sample. Add data of each sample inside its directory. In case it is a paired-end RNA-Seq, each sample directory should contain two FASTQ files, which must present names ending according to the patterns {sample}_1.fastq.gz and {sample}_2.fastq.gz, forward and reverse sequences, respectively. For example, a sample named "Healthy_control" must have a directory with the same name and FASTQ files named Healthy_control_1.fastq.gz and Healthy_control_2.fastq.gz. Nevertheless, if the library sequencing is a single-end strategy, only one reads file should be saved for downstream analysis. For instance, the same sample, "Healthy control", must have a unique FASTQ file named Healthy_control.fastq.gz.
2. Create a phenotypic file containing all sample names: Name the first column as 'Sample' and the second column as 'Class'. Fill the Sample column with sample names, which must be the same name for the sample directories and fill the Class column with the phenotypic group of each sample (e.g., control or infected). Finally, save a file with the name "metadata.tsv" and send it to the /home/transcriptome-pipeline/data/ directory. Check out the existing metadata.tsv to understand the format of the phenotypic file.
  cp /opt/transferdata/metadata.tsv
  /home/transcriptome-pipeline/data/metadata.tsv
3. Access the Host Path directory defined in step 1.5.4 and copy the new structured directories samples. Finally, move the samples from /opt/transferdata to the pipeline data directory.
  cp -rf /opt/transferdata/reads/*
  /home/transcriptome-pipeline/data/reads/
Observe that all reads are stored in the folder /home/transcriptome-pipeline/data/reads.

2. Quality control of the data

NOTE: Evaluate, graphically, the probability of errors in the sequencing reads. Remove all the technical sequences, e.g., adaptors.

Access the sequencing quality of libraries with the FastQC tool.
1. To generate the quality graphs, run the fastqc program. Execute the command:
  bash FastQC.sh
  NOTE: The results will be saved in the /home/transcriptome-pipeline/results/FastQC folder. Since sequence adapters are used for library preparation and sequencing, in some cases the fragments of adapters sequence can interfere with the mapping process.
Remove the adapter sequence and low-quality reads. Access the Scripts folder and execute the command for the Trimmomatic tool:
cd /home/transcriptome-pipeline/scripts
bash trimmomatic.sh
NOTE: The parameters used for sequencing filter are: Remove leading low quality or 3 bases (below quality 3) (LEADING:3); Remove trailing low quality or 3 bases (below quality 3) (TRAILING:3); Scan the read with a 4-base wide sliding window, cutting when the average quality per base drops below 20 (SLIDINGWINDOW:4:20); and Drop reads below the 36 bases long (MINLEN:36). These parameters could be altered by editing the Trimmomatic script file.
1. Ensure that the results are saved in the following folder: /home/transcriptome-pipeline/results/trimreads. Execute the command:
  ls /home/transcriptome-pipeline/results/trimreads

3. Mapping and annotation of samples

NOTE: After obtaining the good quality reads, these need to be mapped to the reference genome. For this step, the STAR mapper was used to map the example samples. The STAR mapper tool requires 32 GB of RAM memory to load and execute the reads and genome mapping. For users who don't have 32 GB of RAM memory, already mapped reads can be used. In such cases jump to step 3.3 or use the Bowtie2 mapper. This section has scripts for STAR (results shown in all figures) and Bowtie2 (low-memory required mapper).

First index the reference genome for the mapping process:
1. Access the Scripts folder using the command line:
  cd /home/transcriptome-pipeline/scripts
2. For STAR mapper, execute:
  bash indexGenome.sh
3. For Bowtie mapper, execute:
  bash indexGenomeBowtie2.sh
Execute the following command to map filtered reads (obtained from step 2) to the reference genome (GRCh38 version). Both STAR and Bowtie2 mappers are performed using default parameters.
1. For STAR mapper, execute:
  bash mapSTAR.sh
2. For Bowtie2 mapper, execute:
  bash mapBowtie2.sh
  NOTE: The final results are Binary Alignment Map (BAM) files to each sample stored in /home/transcriptome-pipeline/results/mapreads.
Annotate mapped reads using the FeatureCounts tool to obtain raw counts for each gene. Run the scripts that annotate the reads.
NOTE: The FeatureCounts tool is responsible to assign mapped sequencing reads to the genomic features. The most important aspects of genome annotation that can be changed following the biological question include, detection of isoforms, multiple mapped reads and exon-exon junctions, corresponding to the parameters, GTF.attrType="gene_name" for gene or not specify the parameters for meta-feature level, allowMultiOverlap=TRUE, and juncCounts=TRUE, respectively.
1. Access the scripts folder using command line:
  cd /home/transcriptome-pipeline/scripts
2. To annotate the mapped reads to obtain raw counts per gene, execute the command line:
  Rscript annotation.R
  NOTE: The parameters used for the annotation process were: return gene short name (GTF.attrType="gene_name"); allow multiple overlaps (allowMultiOverlap = TRUE); and indicate the library is paired-end (isPairedEnd=TRUE). For single-end strategy, use the parameter isPairedEnd=FALSE. The results will be saved in the /home/transcriptome-pipeline/countreads folder.
Normalize gene expression.
NOTE: Normalizing gene expression is essential to compare results between outcomes (e.g., healthy and infected samples). Normalization is also required to perform the co-expression and molecular degree of perturbation analyses.
1. Access the Scripts folder using the command line:
  cd /home/transcriptome-pipeline/scripts
2. Normalize the gene expression. For this, execute the command line:
  Rscript normalizesamples.R
  NOTE: The raw counts expression, in this experiment, were normalized using Trimmed Mean of M-values (TMM) and Count Per Million (CPM) methods. This step aims to remove differences in gene expression due to the technical influence, by doing library size normalization. The results will be saved in the /home/transcriptome-pipeline/countreads folder.

4. Differentially expressed genes and co-expressed genes

Identify differentially expressed genes using the open-source EdgeR package. This involves finding genes whose expression is higher or lower compared to the control.
1. Access the Scripts folder using the command line:
  cd /home/transcriptome-pipeline/scripts
2. To identify the differentially expressed gene, execute the DEG_edgeR R script using the command line:
  Rscript DEG_edgeR.R
  NOTE: The results containing the differentially expressed genes will be saved in the /home/transcriptome-pipeline/results/degs folder. Data can be transferred to a personal computer.
Download data from the csblusp/transcriptome container.
1. Transfer processed data from the /home/transcriptome-pipeline to the /opt/transferdata folder (local computer).
2. Copy all files to the local computer by executing the command line:
  cp -rf /home/transcriptome-pipeline/results /opt/transferdata/pipeline
  cp -rf /home/transcriptome-pipeline/data /opt/transferdata/pipeline
  NOTE: Now, go to the local computer to ensure all the results, datasets, and data are available to download in the Host Path.
Identify co-expression modules.
1. Access the Co-Expression Modules Identification Tool (CEMiTool) website (Table of
  Materials). This tool identifies co-expression modules from expression datasets provided by the users. On the main page, click on Run at the top right. This will open a new page to upload the expression file.
2. Click on Choose File below the Expression File section and upload the normalized gene expression matrix 'tmm_expression.tsv' from the Host Path.
  NOTE: Step 4.4. is non-mandatory.
Explore the biological meaning of co-expression modules.
1. Click on Choose File in the Sample Phenotypes section and upload the file with sample phenotypes metadata_cemitool.tsv from the Download data step 4.2.2. to perform a gene set enrichment analysis (GSEA).
2. Press Choose File in the Gene Interactions section to upload a file with gene interactions (cemitool-interactions.tsv). It is possible to use the file of gene interactions provided as an example by webCEMiTool. The interactions can be protein-protein interactions, transcription factors and their transcribed genes, or metabolic pathways. This step produces an interaction network for each co-expression module.
3. Click on the Choose File in the Gene Sets section to upload a list of genes functionally related in a Gene Matrix Transposed (GMT) format file. The Gene Set file enables the tool to perform enrichment analysis for each co-expression module, i.e, an over-representation analysis (ORA).
  NOTE: This list of genes can encompass pathways, GO terms, or miRNA-target genes. The researcher can use the Blood Transcription Modules (BTM) as gene sets for this analysis. The BTM file (BTM_for_GSEA.gmt).
Set parameters for performing co-expression analyses and obtain its results.
1. Next expand the Parameter section, by clicking on the plus sign to exhibit the default parameters. If necessary, change them. Check the Apply VST box.
2. Write the e-mail in the Email section to receive results as an email. This step is optional.
3. Press the Run CEMiTool button.
4. Download the full analysis report by clicking on the Download Full Report at the top right. It will download a compressed file cemitool_results.zip.
5. Extract the contents of the cemitool_results.zip with WinRAR.
  NOTE: The folder with the extracted contents encompasses several files with all results of the analysis and their established parameters.

5. Determination of the molecular degree of perturbation of samples

Molecular Degree of Perturbation (MDP) web version.
1. To run MDP, access the MDP website (Table of Materials). MDP calculates molecular distance of each sample from the reference. Click on the Run button.
2. On the Choose File link, upload the expression file tmm_expression.tsv. Then, upload the phenotypic data file metadata.tsv from the Download data step 4.2.2. It is also possible to submit a pathway annotation file in GMT format to calculate the perturbation score of the pathways associated with the disease.
3. Once the data are uploaded, define the Class column that contains the phenotypic information used by the MDP. Then, define the control class by selecting the label that corresponds to the control class.
  NOTE: There are some optional parameters that will affect how the sample scores are calculated. If necessary, the user is able to change the statistics average method, standard deviation, and top percentage of the perturbed genes.
4. After that, press the Run MDP button and the MDP results will be shown. The user can download the figures by clicking on the Download Plot in each plot, as well as the MDP score on the Download MDP Score File button.
  NOTE: In case of questions about how to submit the files or how MDP works, just go through the Tutorial and About webpages.

6. Functional enrichment analysis

Create one list of down-regulated DEGs and another of up-regulated DEGs. Gene names must be according to Entrez gene symbols. Each gene of the list must be placed on one line.
Save the gene lists in the txt or tsv format.
Access the Enrichr website (Table of Materials) to perform the functional analysis.
Select the list of genes by clicking on the Choose File. Select one of the DEGs list and press the Submit button.
Click on Pathways at the top of the webpage to perform functional enrichment analysis with the ORA approach.
Choose a pathway database. "Reactome 2016" pathway database is broadly used to get the biological meaning of human data.
Click on the name of the pathway database again. Select Bar Graph and check whether it is sorted by p-value ranking. If not, click on the bar graph until it is sorted by p-value. This bar graph includes the top 10 pathways according to p-values.
Press the Configuration button and select the red color for the up-regulated genes analysis or blue color for the down-regulated genes analysis. Save the bar graph in several formats by clicking on svg, png, and jpg.
Select Table and click on Export Entries to the Table at the bottom left of the bar graph to obtain the functional enrichment analysis results in a txt file.
NOTE: This functional enrichment results file encompasses in each line the name of one pathway, the number of overlapped genes between the submitted DEG list and the pathway, the p-value, adjusted p-value, odds ratio, combined score, and the gene symbol of genes present in the DEG list that participate in the pathway.
Repeat the same steps with the other DEGs list.
NOTE: The analysis with down-regulated DEGs provides pathways enriched for down-regulated genes and the analysis with up-regulated genes provides pathways enriched for up-regulated genes.

Results

The computing environment for transcriptome analyses was created and configured on the Docker platform. This approach allows beginner Linux users to use Linux terminal systems without a priori management knowledge. The Docker platform uses the resources of the host OS to create a service container that includes specific users' tools (Figure 1B). A container based on the Linux OS Ubuntu 20.04 distribution was created and it was fully configured for transcriptomic analyses, which is access...

Discussion

The preparation of the sequencing libraries is a crucial step toward answering biological questions in the best possible way. The type of transcripts of interest of the study will guide which type of sequencing library will be chosen and drive bioinformatic analyses. For example, from the sequencing of a pathogen and host interaction, according to the type of sequencing, it is possible to identify sequences from both or just from the host transcripts.

Next-generation sequencing equipment, e.g....

Disclosures

The authors have nothing to disclose.

Acknowledgements

HN is funded by FAPESP (grant numbers: #2017/50137-3, 2012/19278-6, 2018/14933-2, 2018/21934-5, and 2013/08216-2) and CNPq (313662/2017-7).

We are particularly thankful to the following grants for fellows: ANAG (FAPESP Process 2019/13880-5), VEM (FAPESP Process 2019/16418-0), IMSC (FAPESP Process 2020/05284-0), APV (FAPESP Process 2019/27146-1) and, RLTO (CNPq Process 134204/2019-0).

Materials

Name	Company	Catalog Number	Comments
CEMiTool	Computational Systems Biology Laboratory	1.12.2	Discovery and the analysis of co-expression gene modules in a fully automatic manner, while providing a user-friendly HTML report with high-quality graphs.
EdgeR	Bioconductor (Maintainer: Yunshun Chen [yuchen at wehi.edu.au])	3.30.3	Differential expression analysis of RNA-seq expression profiles with biological replication
EnhancedVolcano	Bioconductor (Maintainer: Kevin Blighe [kevin at clinicalbioinformatics.co.uk])	1.6.0	Publication-ready volcano plots with enhanced colouring and labeling
FastQC	Babraham Bioinformatics	0.11.9	Aims to provide a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing
FeatureCounts	Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research	2.0.0	Assign mapped sequencing reads to specified genomic features
MDP	Computational Systems Biology Laboratory	1.8.0	Molecular Degree of Perturbation calculates scores for transcriptome data samples based on their perturbation from controls
R	R Core Group	4.0.3	Programming language and free software environment for statistical computing and graphics
STAR	Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research	2.7.6a	Aligner designed to specifically address many of the challenges of RNA-seq data mapping using a strategy to account for spliced alignments
Bowtie2	Johns Hopkins University	2.4.2	Ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences
Trimmomatic	THE USADEL LAB	0.39	Trimming adapter sequence tasks for Illumina paired-end and single-ended data
Get Docker	Docker	20.10.2	Create a bioinformatic environment reproducible and predictable (https://docs.docker.com/get-docker/)
WSL2-Kernel	Windows	NA	https://docs.microsoft.com/en-us/windows/wsl/wsl2-kernel
Get Docker Linux	Docker	NA	https://docs.docker.com/engine/install/ubuntu/
Docker Linux Repository	Docker	NA	https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository
MDP Website	Computational Systems Biology Laboratory	NA	https://mdp.sysbio.tools
Enrichr Website	MaayanLab	NA	https://maayanlab.cloud/Enrichr/
webCEMiTool	Computational Systems Biology Laboratory	NA	https://cemitool.sysbio.tools/
gProfiler	Bioinformatics, Algorithmics and Data Mining Group	NA	https://biit.cs.ut.ee/gprofiler/gost
goseq	Bioconductor (Maintainer: Matthew Young [my4 at sanger.ac.uk])	NA	http://bioconductor.org/packages/release/bioc/html/goseq.html
SRA NCBI study	NCBI	NA	https://www.ncbi.nlm.nih.gov/bioproject/PRJNA507472/

References

Weaver, S. C., Charlier, C., Vasilakis, N., Lecuit, M. Zika, Chikungunya, and Other Emerging Vector-Borne Viral Diseases. Annual Review of Medicine. 69, 395-408 (2018).
Burt, F. J., et al. Chikungunya virus: an update on the biology and pathogenesis of this emerging pathogen. The Lancet. Infectious Diseases. 17 (4), 107-117 (2017).
Hua, C., Combe, B. Chikungunya virus-associated disease. Current Rheumatology Reports. 19 (11), 69 (2017).
Suhrbier, A., Jaffar-Bandjee, M. -. C., Gasque, P. Arthritogenic alphaviruses-an overview. Nature Reviews Rheumatology. 8 (7), 420-429 (2012).
Nakaya, H. I., et al. Gene profiling of chikungunya virus arthritis in a mouse model reveals significant overlap with rheumatoid arthritis. Arthritis and Rheumatism. 64 (11), 3553-3563 (2012).
Michlmayr, D., et al. Comprehensive innate immune profiling of chikungunya virus infection in pediatric cases. Molecular Systems Biology. 14 (8), 7862 (2018).
Soares-Schanoski, A., et al. Systems analysis of subjects acutely infected with the Chikungunya virus. PLOS Pathogens. 15 (6), 1007880 (2019).
Alexandersen, S., Chamings, A., Bhatta, T. R. SARS-CoV-2 genomic and subgenomic RNAs in diagnostic samples are not an indicator of active replication. Nature Communications. 11 (1), 6059 (2020).
Wang, D., et al. The SARS-CoV-2 subgenome landscape and its novel regulatory features. Molecular Cell. 81 (10), 2135-2147 (2021).
Wilson, J. A. C., et al. RNA-Seq analysis of chikungunya virus infection and identification of granzyme A as a major promoter of arthritic inflammation. PLOS Pathogens. 13 (2), 1006155 (2017).
Gonçalves, A. N. A., et al. Assessing the impact of sample heterogeneity on transcriptome analysis of human diseases using MDP webtool. Frontiers in Genetics. 10, 971 (2019).
Russo, P. S. T., et al. CEMiTool: a Bioconductor package for performing comprehensive modular co-expression analyses. BMC Bioinformatics. 19 (1), 56 (2018).
Costa-Silva, J., Domingues, D., Lopes, F. M. RNA-Seq differential expression analysis: An extended review and a software tool. PloS One. 12 (12), 0190152 (2017).
Seyednasrollah, F., Laiho, A., Elo, L. L. Comparison of software packages for detecting differential expression in RNA-seq studies. Briefings in Bioinformatics. 16 (1), 59-70 (2015).
Zhang, B., Horvath, S. A general framework for weighted gene co-expression network analysis. Statistical Applications in Genetics and Molecular Biology. 4, (2005).
Cheng, C. W., Beech, D. J., Wheatcroft, S. B. Advantages of CEMiTool for gene co-expression analysis of RNA-seq data. Computers in Biology and Medicine. 125, 103975 (2020).
Cardozo, L. E., et al. webCEMiTool: Co-expression modular analysis made easy. Frontiers in Genetics. 10, 146 (2019).
de Lima, D. S., et al. Long noncoding RNAs are involved in multiple immunological pathways in response to vaccination. Proceedings of the National Academy of Sciences of the United States of America. 116 (34), 17121-17126 (2019).
Prada-Medina, C. A., et al. Systems immunology of diabetes-tuberculosis comorbidity reveals signatures of disease complications. Scientific Reports. 7 (1), 1999 (2017).
Chen, E. Y., et al. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinformatics. 14, 128 (2013).
Kuleshov, M. V., et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Research. 44, 90-97 (2016).
Raudvere, U., et al. g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update). Nucleic Acids Research. 47, 191-198 (2019).
Young, M. D., Wakefield, M. J., Smyth, G. K., Oshlack, A. Gene ontology analysis for RNA-seq: accounting for selection bias. Genome Biology. 11 (2), 14 (2010).

Reprints and Permissions

Request permission to reuse the text or figures of this JoVE article

Request Permission

Explore More Articles

High throughput Transcriptome Analysis Host pathogen Interactions Quality Control RNA Sequencing Reference Genome Mapping Differential Gene Expression Co expression Analysis Molecular Perturbation Analysis Functional Analysis Docker Container Linux System Tools Chikungunya Virus Docker Installation Windows Terminal

This article has been published

Video Coming Soon

Keep me updated: