JoVE Logo
Faculty Resource Center

Sign In

Summary

Abstract

Introduction

Protocol

Representative Results

Discussion

Acknowledgements

Materials

References

Engineering

Analyzing Multifactorial RNA-Seq Experiments with DiCoExpress

Published: July 29th, 2022

DOI:

10.3791/62566

1Université Paris-Saclay, CNRS, INRAE, Univ Evry, Institute of Plant Sciences Paris-Saclay (IPS2), Orsay, France, 2Université de Paris, CNRS, INRAE, Institute of Plant Sciences Paris Saclay (IPS2), Orsay, France, 3Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, Gif-sur-Yvette, France, 4LSTM, Univ Montpellier, INRAE, IRD, CIRAD, Institut Agro, Montpellier, France, 5Universitté Paris-Saclay, AgroParisTech, INRAE, UMR MIA-Paris, Paris, France

DiCoExpress is a script-based tool implemented in R to perform an RNA-Seq analysis from quality control to co-expression. DiCoExpress handles complete and unbalanced design up to 2 biological factors. This video tutorial guides the user through the different features of DiCoExpress.

The proper use of statistical modeling in NGS data analysis requires an advanced level of expertise. There has recently been a growing consensus on using generalized linear models for differential analysis of RNA-Seq data and the advantage of mixture models to perform co-expression analysis. To offer a managed setting to use these modeling approaches, we developed DiCoExpress that provides a standardized R pipeline to perform an RNA-Seq analysis. Without any particular knowledge in statistics or R programming, beginners can perform a complete RNA-Seq analysis from quality controls to co-expression through differential analysis based on contrasts inside a generalized linear model. An enrichment analysis is proposed both on the lists of differentially expressed genes, and the co-expressed gene clusters. This video tutorial is conceived as a step-by-step protocol to help users take full advantage of DiCoExpress and its potential in empowering the biological interpretation of an RNA-Seq experiment.

Next-generation RNA sequencing (RNA-Seq) technology is now the gold standard of transcriptome analysis1. Since the early days of the technology, the combined efforts of bioinformaticians and biostatisticians have resulted in the development of numerous methods tackling all the essential steps of transcriptomic analyses, from mapping to transcript quantification2. Most of the tools available today to the biologist are developed within the R software environment for statistical computing and graphs3, and many packages for biological data analysis are available in the Bioconductor repository

Log in or to access full content. Learn more about your institution’s access to JoVE content here

1. DiCoExpress

  1. Open a R studio session and set directory to Template_scripts.
  2. Open the DiCoExpress_Tutorial.R script in R studio.
  3. Load DiCoExpress functions in the R session with the following commands:
    > source("../Sources/Load_Functions.R")
    > Load_Functions()
    > Data_Directory = "../Data"
    > Results_Directory = "../Results/"
  4. Load data files in the R session with the following commands:
    > Project_Name =.......

Log in or to access full content. Learn more about your institution’s access to JoVE content here

All the DiCoExpress outputs are saved in the Tutorial/ directory, itself placed within the Results/ directory. We provide here some guidance for assessing the overall quality of the analysis.

Quality Control
The quality control output, located in the Quality_Control/ directory, is essential to verify that the RNA-Seq analysis results are reliable. The Data_Quality_Control.pdf file contains several plots obtained with raw and normalized data that can be used to identify a.......

Log in or to access full content. Learn more about your institution’s access to JoVE content here

Because RNA-Seq has become a ubiquitous method in biological studies, there is a constant need to develop versatile and user-friendly analytical tools. A critical step within most of the analytical workflows is often to identify with confidence the genes differentially expressed between biological conditions and/or treatments15. The production of reliable results requires proper statistical modeling, which has been the motivation for the development of DiCoExpress.

DiCo.......

Log in or to access full content. Learn more about your institution’s access to JoVE content here

This work was mainly supported by the ANR PSYCHE (ANR-16-CE20-0009). The authors thank F. Desprez for the construction of the container of DiCoExpress. KB work is supported by the Investment for the Future ANR-10-BTBR-01-01 Amaizing program. The GQE and IPS2 laboratories benefit from the support of Saclay Plant Sciences-SPS (ANR-17-EUR-0007).

....

Log in or to access full content. Learn more about your institution’s access to JoVE content here

Name Company Catalog Number Comments

  1. Wang, Z., Gerstein, M., Snyder, M. RNA-Seq: a revolutionary tool for transcriptomics. Nature reviews. Genetics. 10 (1), 57-63 (2009).
  2. Yang, I. S., Kim, S. Analysis of Whole Transcriptome Sequencing Data: Workflow and Software. Genomics & Informatics. 13 (4), 119-125 (2015).
  3. R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing. , (2020).
  4. Huber, W., et al. Orchestrating high-throughput genomic analysis with Bioconductor. Nature Methods. 12 (2), 115-121 (2015).
  5. Smith, D. R. The battle for user-friendly bioinformatics. Frontiers in Genetics. 4, 187 (2013).
  6. Pavelin, K., Cham, J. A., de Matos, P., Brooksbank, C., Cameron, G., Steinbeck, C. Bioinformatics Meets User-Centred Design: A Perspective. PLoS Computational Biology. 8 (7), 1002554 (2012).
  7. . Shiny: web application framework Available from: https://rdrr.io/cran/shiny/ (2021)
  8. Lambert, I., Roux, C. P. -. L., Colella, S., Martin-Magniette, M. -. L. DiCoExpress: a tool to process multifactorial RNAseq experiments from quality controls to co-expression analysis through differential analysis based on contrasts inside GLM models. Plant methods. 16 (1), 68 (2020).
  9. Dillies, M. -. A., et al. A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. Briefings in bioinformatics. 14 (6), 671-683 (2012).
  10. Rigaill, G. Synthetic data sets for the identification of key ingredients for RNA-seq differential analysis. Briefings in Bioinformatics. 19 (1), (2016).
  11. Rau, A., Maugis-Rabusseau, C. Transformation and model choice for RNA-seq co-expression analysis. Briefings in Bioinformatics. 19 (3), (2017).
  12. Robinson, M. D., McCarthy, D. J., Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 26 (1), 139-140 (2009).
  13. Wilkinson, M. D., et al. The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data. 3 (1), 160018 (2016).
  14. Stark, R., Grzelak, M., Hadfield, J. RNA sequencing: the teenage years. Nature Reviews Genetics. 20 (11), 631-656 (2019).

This article has been published

Video Coming Soon

JoVE Logo

Privacy

Terms of Use

Policies

Research

Education

ABOUT JoVE

Copyright © 2024 MyJoVE Corporation. All rights reserved