JoVE Logo
Faculty Resource Center

Sign In





Representative Results





Immunology and Infection

High-Throughput Transcriptome Analysis for Investigating Host-Pathogen Interactions

Published: March 5th, 2022



1Laboratory of Pathology of Infectious Diseases, Department of Pathology, Medical School, University of São Paulo, 2Scientific Platform Pasteur USP, 3Department of Clinical and Toxicological Analyses, School of Pharmaceutical Sciences, University of São Paulo, 4Hospital Israelita Albert Einstein

The protocol presented here describes a complete pipeline to analyze RNA-sequencing transcriptome data from raw reads to functional analysis, including quality control and preprocessing steps to advanced statistical analytical approaches.

Pathogens can cause a wide variety of infectious diseases. The biological processes induced by the host in response to infection determine the severity of the disease. To study such processes, researchers can use high-throughput sequencing techniques (RNA-seq) that measure the dynamic changes of the host transcriptome at different stages of infection, clinical outcomes, or disease severity.This investigation can lead to a better understanding of the diseases, as well as uncovering potential drug targets and treatments. The protocol presented here describes a complete pipeline to analyze RNA-sequencing data from raw reads to functional analysis. The pipeline is divided into five steps: (1) quality control of the data; (2) mapping and annotation of genes; (3) statistical analysis to identify differentially expressed genes and co-expressed genes; (4) determination of the molecular degree of the perturbation of samples; and (5) functional analysis. Step 1 removes technical artifacts that may impact the quality of downstream analyses. In step 2, genes are mapped and annotated according to standard library protocols. The statistical analysis in step 3 identifies genes that are differentially expressed or co-expressed in infected samples, in comparison with non-infected ones. Sample variability and the presence of potential biological outliers are verified using the molecular degree of perturbation approach in step 4. Finally, the functional analysis in step 5 reveals the pathways associated with the disease phenotype. The presented pipeline aims to support researchers through the RNA-seq data analysis from host-pathogen interaction studies and drive future in vitro or in vivo experiments, that are essential to understand the molecular mechanism of infections.

Arboviruses, such as dengue, yellow fever, chikungunya, and zika, have been widely associated with several endemic outbreaks and have emerged as one of the main pathogens responsible for infecting humans in the last decades1,2. Individuals infected with the chikungunya virus (CHIKV) often have fever, headache, rash, polyarthralgia, and arthritis3,4,5. Viruses can subvert the gene expression of the cell and influence various host signaling pathways. Recently, blood transcriptome studies utilized RNA-seq to identify the....

Log in or to access full content. Learn more about your institution’s access to JoVE content here

The samples used in this protocol were approved by the ethics committees from both the Department of Microbiology of the Institute of Biomedical Sciences at the University of São Paulo and the Federal University of Sergipe (Protocols: 54937216.5.0000.5467 and 54835916.2.0000.5546, respectively).

1. Docker desktop installation

NOTE: Steps to prepare the Docker environment are different among the operating systems (OSs). Therefore, Mac users must f.......

Log in or to access full content. Learn more about your institution’s access to JoVE content here

The computing environment for transcriptome analyses was created and configured on the Docker platform. This approach allows beginner Linux users to use Linux terminal systems without a priori management knowledge. The Docker platform uses the resources of the host OS to create a service container that includes specific users' tools (Figure 1B). A container based on the Linux OS Ubuntu 20.04 distribution was created and it was fully configured for transcriptomic analyses, which is access.......

Log in or to access full content. Learn more about your institution’s access to JoVE content here

The preparation of the sequencing libraries is a crucial step toward answering biological questions in the best possible way. The type of transcripts of interest of the study will guide which type of sequencing library will be chosen and drive bioinformatic analyses. For example, from the sequencing of a pathogen and host interaction, according to the type of sequencing, it is possible to identify sequences from both or just from the host transcripts.

Next-generation sequencing equipment, e.g........

Log in or to access full content. Learn more about your institution’s access to JoVE content here

HN is funded by FAPESP (grant numbers: #2017/50137-3, 2012/19278-6, 2018/14933-2, 2018/21934-5, and 2013/08216-2) and CNPq (313662/2017-7).

We are particularly thankful to the following grants for fellows: ANAG (FAPESP Process 2019/13880-5), VEM (FAPESP Process 2019/16418-0), IMSC (FAPESP Process 2020/05284-0), APV (FAPESP Process 2019/27146-1) and, RLTO (CNPq Process 134204/2019-0).


Log in or to access full content. Learn more about your institution’s access to JoVE content here

Name Company Catalog Number Comments
CEMiTool Computational Systems Biology Laboratory 1.12.2 Discovery and the analysis of co-expression gene modules in a fully automatic manner, while providing a user-friendly HTML report with high-quality graphs.
EdgeR Bioconductor (Maintainer: Yunshun Chen [yuchen at]) 3.30.3 Differential expression analysis of RNA-seq expression profiles with biological replication
EnhancedVolcano Bioconductor (Maintainer: Kevin Blighe [kevin at]) 1.6.0 Publication-ready volcano plots with enhanced colouring and labeling
FastQC Babraham Bioinformatics 0.11.9 Aims to provide a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing
FeatureCounts Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research 2.0.0 Assign mapped sequencing reads to specified genomic features
MDP Computational Systems Biology Laboratory 1.8.0 Molecular Degree of Perturbation calculates scores for transcriptome data samples based on their perturbation from controls
R R Core Group 4.0.3 Programming language and free software environment for statistical computing and graphics
STAR Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research 2.7.6a Aligner designed to specifically address many of the challenges of RNA-seq data mapping using a strategy to account for spliced alignments
Bowtie2 Johns Hopkins University 2.4.2 Ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences
Trimmomatic THE USADEL LAB 0.39 Trimming adapter sequence tasks for Illumina paired-end and single-ended data
Get Docker Docker 20.10.2 Create a bioinformatic environment reproducible and predictable (
WSL2-Kernel Windows NA
Get Docker Linux Docker NA
Docker Linux Repository Docker NA
MDP Website Computational Systems Biology Laboratory NA
Enrichr Website MaayanLab NA
webCEMiTool Computational Systems Biology Laboratory NA
gProfiler Bioinformatics, Algorithmics and Data Mining Group NA
goseq Bioconductor (Maintainer: Matthew Young [my4 at]) NA

  1. Weaver, S. C., Charlier, C., Vasilakis, N., Lecuit, M. Zika, Chikungunya, and Other Emerging Vector-Borne Viral Diseases. Annual Review of Medicine. 69, 395-408 (2018).
  2. Burt, F. J., et al. Chikungunya virus: an update on the biology and pathogenesis of this emerging pathogen. The Lancet. Infectious Diseases. 17 (4), 107-117 (2017).
  3. Hua, C., Combe, B. Chikungunya virus-associated disease. Current Rheumatology Reports. 19 (11), 69 (2017).
  4. Suhrbier, A., Jaffar-Bandjee, M. -. C., Gasque, P. Arthritogenic alphaviruses-an overview. Nature Reviews Rheumatology. 8 (7), 420-429 (2012).
  5. Nakaya, H. I., et al. Gene profiling of chikungunya virus arthritis in a mouse model reveals significant overlap with rheumatoid arthritis. Arthritis and Rheumatism. 64 (11), 3553-3563 (2012).
  6. Michlmayr, D., et al. Comprehensive innate immune profiling of chikungunya virus infection in pediatric cases. Molecular Systems Biology. 14 (8), 7862 (2018).
  7. Soares-Schanoski, A., et al. Systems analysis of subjects acutely infected with the Chikungunya virus. PLOS Pathogens. 15 (6), 1007880 (2019).
  8. Alexandersen, S., Chamings, A., Bhatta, T. R. SARS-CoV-2 genomic and subgenomic RNAs in diagnostic samples are not an indicator of active replication. Nature Communications. 11 (1), 6059 (2020).
  9. Wang, D., et al. The SARS-CoV-2 subgenome landscape and its novel regulatory features. Molecular Cell. 81 (10), 2135-2147 (2021).
  10. Wilson, J. A. C., et al. RNA-Seq analysis of chikungunya virus infection and identification of granzyme A as a major promoter of arthritic inflammation. PLOS Pathogens. 13 (2), 1006155 (2017).
  11. Gonçalves, A. N. A., et al. Assessing the impact of sample heterogeneity on transcriptome analysis of human diseases using MDP webtool. Frontiers in Genetics. 10, 971 (2019).
  12. Russo, P. S. T., et al. CEMiTool: a Bioconductor package for performing comprehensive modular co-expression analyses. BMC Bioinformatics. 19 (1), 56 (2018).
  13. Costa-Silva, J., Domingues, D., Lopes, F. M. RNA-Seq differential expression analysis: An extended review and a software tool. PloS One. 12 (12), 0190152 (2017).
  14. Seyednasrollah, F., Laiho, A., Elo, L. L. Comparison of software packages for detecting differential expression in RNA-seq studies. Briefings in Bioinformatics. 16 (1), 59-70 (2015).
  15. Zhang, B., Horvath, S. A general framework for weighted gene co-expression network analysis. Statistical Applications in Genetics and Molecular Biology. 4, (2005).
  16. Cheng, C. W., Beech, D. J., Wheatcroft, S. B. Advantages of CEMiTool for gene co-expression analysis of RNA-seq data. Computers in Biology and Medicine. 125, 103975 (2020).
  17. Cardozo, L. E., et al. webCEMiTool: Co-expression modular analysis made easy. Frontiers in Genetics. 10, 146 (2019).
  18. de Lima, D. S., et al. Long noncoding RNAs are involved in multiple immunological pathways in response to vaccination. Proceedings of the National Academy of Sciences of the United States of America. 116 (34), 17121-17126 (2019).
  19. Prada-Medina, C. A., et al. Systems immunology of diabetes-tuberculosis comorbidity reveals signatures of disease complications. Scientific Reports. 7 (1), 1999 (2017).
  20. Chen, E. Y., et al. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinformatics. 14, 128 (2013).
  21. Kuleshov, M. V., et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Research. 44, 90-97 (2016).
  22. Raudvere, U., et al. g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update). Nucleic Acids Research. 47, 191-198 (2019).
  23. Young, M. D., Wakefield, M. J., Smyth, G. K., Oshlack, A. Gene ontology analysis for RNA-seq: accounting for selection bias. Genome Biology. 11 (2), 14 (2010).

This article has been published

Video Coming Soon

JoVE Logo


Terms of Use





Copyright © 2024 MyJoVE Corporation. All rights reserved