A subscription to JoVE is required to view this content. Sign in or start your free trial.
Method Article
This method describes the steps to improve the quality and quantity of sequence data that can be obtained from formalin-fixed paraffin-embedded (FFPE) RNA samples. We describe the methodology to more accurately assess the quality of FFPE-RNA samples, prepare sequencing libraries, and analyze the data from FFPE-RNA samples.
Gene expression analysis by RNA sequencing (RNA-seq) enables unique insights into clinical samples that can potentially lead to mechanistic understanding of the basis of various diseases as well as resistance and/or susceptibility mechanisms. However, FFPE tissues, which represent the most common method for preserving tissue morphology in clinical specimens, are not the best sources for gene expression profiling analysis. The RNA obtained from such samples is often degraded, fragmented, and chemically modified, which leads to suboptimal sequencing libraries. In turn, these generate poor quality sequence data that may not be reliable for gene expression analysis and mutation discovery. In order to make the most of FFPE samples and obtain the best possible data from low quality samples, it is important to take certain precautions while planning experimental design, preparing sequencing libraries, and during data analysis. This includes the use of appropriate metrics for precise sample quality control (QC), identifying the best methods for various steps during the sequencing library generation, and careful library QC. In addition, applying correct software tools and parameters for sequence data analysis is critical in order to identify artifacts in RNA-seq data, filter out contamination and low quality reads, assess uniformity of gene coverage, and measure the reproducibility of gene expression profiles among biological replicates. These steps can ensure high accuracy and reproducibility for profiling of very heterogeneous RNA samples. Here we describe the various steps for sample QC, library preparation and QC, sequencing, and data analysis that can help to increase the amount of useful data obtained from low quality RNA, such as that obtained from FFPE-RNA tissues.
Use of next-generation sequencing approaches has enabled us to glean a wealth of information from various types of samples. However, old and poorly preserved samples remain unworkable for the commonly used methods of generating sequence data and often require modifications to well-established protocols. FFPE tissues represent such a sample type that has been widely utilized for clinical specimens1,2,3. While FFPE preservation maintains tissue morphology, the nucleic acids in FFPE tissues usually exhibit a wide range of damage and degradation, making it difficult to retrieve the genomic information that may lead to important insights about molecular mechanisms underlying various disorders.
Gene expression data generated by RNA sequencing is often instrumental in studying disease and resistance mechanisms and complements DNA mutation analysis. However, RNA is more susceptible to degradation, making it more challenging to generate accurate gene expression data from FFPE tissues. Furthermore, because the wide availability and affordability of sequencing is relatively recent, older specimens were often not stored in conditions required to preserve RNA integrity. Some of the issues for FFPE samples include degradation of RNA due to embedding in paraffin, chemical modification of RNA leading to fragmentation or refractoriness to enzymatic processes required for sequencing, and loss of the poly-A tails, limiting the applicability of oligo-dT as a primer for reverse transcriptase4. Another challenge is the handling/storage of FFPE samples under suboptimal conditions, which may lead to further degradation of labile molecules such as RNA in the tissues5. This is especially relevant for older samples that may have been collected at a time when gene expression analysis by RNA sequencing was not anticipated for the samples. All these lead to decreased quality and quantity of the extracted RNA available for generating useful sequence data. The low probability of success, combined with the high cost of sequencing, has dissuaded many researchers from trying to generate and analyze gene expression data from potentially useful FFPE samples. Some studies in recent years have demonstrated the usability of FFPE tissues for gene expression analysis2,6,7,8,9, albeit for fewer and/or more recent samples.
As a feasibility study, we used RNA extracted from FFPE tumor tissue specimens from three Residual Tissue Repositories from Surveillance, Epidemiology, and End Results (SEER) cancer registries for RNA sequencing and gene expression analysis10. Procured from clinical pathology labs, the FFPE tissues from high-grade ovarian serous adenocarcinomas were stored from 7–32 years under varying conditions before RNA extraction. Because in most cases these blocks had been stored in different sites for years without the expectation of any sensitive genetic analysis in the future, not much care had been taken to preserve the nucleic acids. Thus, most of the samples exhibited poor quality RNA, with a large proportion of samples contaminated with bacteria. Nevertheless, we were able to perform gene quantification, measure the uniformity and continuity of gene coverage, and perform the Pearson correlation analysis among biological replicates to measure reproducibility. Based on a set of key signature gene panel, we compared the samples in our study with The Cancer Genome Atlas (TCGA) data and confirmed that approximately 60% of the samples had comparable gene expression profiles11. Based on the correlation between various QC results and sample metadata, we identified key QC metrics that have good predictive value for identifying samples that are more likely to generate usable sequence data11.
Here we describe the methodology used for FFPE-RNA quality assessment, generation of sequencing libraries starting from extracted RNA samples, and bioinformatic analysis of the sequencing data.
1. RNA quantity and quality assessment
2. Sequencing library preparation
3. Sequencing library QC
4. Sequencing
5. Data analysis and quality assessment
NOTE: A typical RNA-seq data analysis workflow (Figure 1) includes preprocessing and QC, alignment to genome and post alignment QC, gene and transcript quantification, sample correlation analysis, differential analysis between different sample groups, treatment conditions, and gene set enrichment and pathway analysis.
The RNA-seq data may have quality issues that can affect the accuracy of gene profiling and lead to erroneous conclusions. Therefore, initial QC checks for sequencing quality, contamination, sequencing coverage bias, and other sources of artifacts are very important. Applying an RNA-Seq QC pipeline similar to the workflow described here is recommended to detect artifacts and apply filtering or correction before downstream analysis.
The methodology described above was applied to 67 FFPE samples that had been stored under a variety of different conditions for 7–32 years (the median sample storage time was 17.5 years). The dataset and analysis results presented here were previously described and published in Zhao et al.11. On checking the sample quality as described earlier (i.e., example traces in Figure 2), DV100 was found to be more useful than DV200 because it is mor...
The method described here outlines the main steps required to obtain good sequence data from FFPE-RNA samples. The main points to consider with this method are: (1) Ensure that the RNA is preserved as best as possible after extraction by minimizing the sample handling and freezing and thawing cycles. Separate QC aliquots are very helpful. (2) Use a QC metric that is best for the given sample set. RIN values and DV200 are often not useful for degraded samples, and DV100 may be the metric of choice to...
This work was funded by the National Cancer Institute (NCI), National Institutes of Health (NIH). Leidos Biomedical Research, Inc. is the operations and technical support contractor for the Frederick National Laboratory for Cancer Research which is fully funded by NIH. Several authors (YZ, MM, KT, YL, JS, BT) are affiliated with Leidos Biomedical Research, Inc., but all of the authors are fully funded by the National Cancer Institute including authors’ salaries and research materials. Leidos Biomedical Research, Inc. did not provide salary for the authors (YZ, MM, KT, YL, JS, BT) or material for the study, nor did it have any role in the study design, data collection, analysis, decision to publish, or preparation of the manuscript.
We are thankful to Dr. Danielle Carrick (Division of Cancer Control and Population Sciences, National Cancer Institute) for continued help, especially for initiating this study, providing us with the samples, and for helpful suggestions during data analysis. We sincerely thank all members of the CCR Sequencing Facility at the Frederick National Laboratory for Cancer Research for their help during sample preparation and sequencing, especially Brenda Ho for assistance in sample QC, Oksana German for library QC, Tatyana Smirnova for running the sequencers. We also would like to thank Tsai-wei Shen and Ashley Walton at Sequencing Facility Bioinformatics Group for helping with data analysis and RNA-seq pipeline implementation. We also thank CCBR and NCBR for assistance with RNaseq analysis pipeline and best practices development.
Name | Company | Catalog Number | Comments |
2100 Bioanalyzer | Agilent | G2939BA | |
Agilent DNA 7500 Kit | Agilent | 5067-1506 | |
Agilent High Sensitivity DNA Kit | Agilent | 5067-4626 | |
Agilent RNA 6000 Nano Kit | Agilent | 5067-1511 | |
AllPrep DNA/RNA FFPE Kit | Qiagen | 80234 | |
CFX96 Touch System | Bio-Rad | 1855195 | |
Library Quantification kit v2-Illumina | KapaBiosystems | KK4824 | |
NEBNext Ultra II Directional RNA Library Prep Kit for Illumina | New England Biolabs | E7765S | https://www.neb.com/protocols/2017/02/07/protocol-for-use-with-ffpe-rna-nebnext-rrna-depletion-kit |
NEBNext rRNA Depletion Kit (Human/Mouse/Rat) | New England Biolabs | E6310L | |
NextSeq 500 Sequencing System | Illumina | SY-415-1001 | NextSeq 500 System guide: https://support.illumina.com/content/dam/illumina-support/documents/documentation/system_documentation/nextseq/nextseq-500-system-guide-15046563-06.pdf |
NextSeq PhiX Control Kit | Illumina | FC-110-3002 | |
NSQ 500/550 Hi Output KT v2.5 (150 CYS) | Illumina | 20024907 | |
10X Genomics Magnetic Separator | 10X Genomics | 120250 | |
Rotator Multimixer | VWR | 13916-822 | |
C1000 Touch Thermal Cycler | Bio-Rad | 1851197 | |
Sequencing reagent kit | Illumina | 20024907 | |
Flow cell package | Illumina | 20024907 | |
Buffer cartridge and the reagent cartridge | Illumina | 20024907 | |
Sodium hydroxide solution (0.2N) | Millipore Sigma | SX0607D-6 | |
TRIS-HCL Buffer 1.0M, pH 7.0 | Fisher Scientific | 50-151-871 |
Request permission to reuse the text or figures of this JoVE article
Request PermissionThis article has been published
Video Coming Soon
Copyright © 2025 MyJoVE Corporation. All rights reserved