Three differential expression analysis methods for RNA sequencing:limma, EdgeR, and DESeq2. Open the RStudio program and then load R file, DEGs. The file can be acquired from supplementary files.One.
Downloading and pre-processing of data.1.1. Download the high-throughput sequencing count data of Cholangiocarcinoma from the Cancer Genome Atlas. This tab can easily achieved by the following code.
Click run to install the R package. Click run to load R package. Set working directory.
Choose the cancer type. Run R code from the GDCquery file to download the data. File GDCquery can be acquired from supplementary files/scripts.
After execution, the Cholangiocarcinoma RNA sequencing count data can be downloaded and named CNT, where rows represent ensemble gene IDs and columns represent symbols IDs. Please notice the numbers at position 14 to 15 in the symbols IDs. Numbers range from 01 to 09 indicate tumors and 10 to 19 indicate normal tissues.1.2.
Conversation of ensemble gene IDs to gene symbols. Import the annotation file into R, according to its'storage path. The annotation file can be acquired from supplementary files.
Run the R code from the gtf v22 file. Which can be acquired from supplementary files/scripts. Apply inn"function and to convert the ensemble gene IDs to gene symbols.1.3.
Filter low-expressed genes. Click run to install package edgeR"Click run to load the R package edgeR"Run following R code to keep genes with counts per million values greater than one in at least two samples.Two. Differential expression analysis through limma"Click Run to install R package limma"Click Run to load R package limma"edgeR"Run the following R code to create design matrix.
Extract group information. Set 01"as tumor tissue. Set 11"as normal tissue.
Create design matrix. Create the DGEList object. Normalize the data.
Run the following R code to perform the limma-trend method based differential expression analysis. Calculate the CPM value. Click Run to fit a linear model to predict the data or infer the relationship between variables.
Calculate T value, F value and the log-odds based on Bayesian. Extract the results table. The results of differential expression analysis is saved in res_limma"which includes the log2 fold change value.
The average log2 expression level of the gene in the experiment. The modified T statistic, P value, the false discovery rate corrected p value and the log-odds of differentially expressed genes. Identify the differentially expressed genes.
So the adjusted P value less than 0.05, and absolute value of log false change greater than or equal to two are thresholds to screen the differentially expressed genes. The results res limma shows that comparing with the normal tissues, 1, 443 genes are up-regulated, and 1, 880 genes are down-regulated in Cholangiocarcinoma tissues. Output the result table to a file.
Click Run to install R package ggplot2"Click Run to load R package ggplot2"Run R code from the volcano file to create the volcano plot and the file volcano can be acquired from supplementary files. Genes can be mapped to different positions according to their log2 fold change and adjusted P values. So up-regulated differentially expressed genes are colored in red.
and the down-regulated differentially expressed genes are colored in green. Click export"to save the volcano plot.Three. Differential expression analysis through edgeR"Click Run to load R package edgeR"Run the following R code to create design matrix.
Click Run to create the DGEList object and normalize the data. Click Run to estimate the dispersion of gene expression value. Click Run to fit model to count data.
Conduct statistical test. Extract the result table. The result is saved in res edgeR"which includes the log fold change value, logCPM, F, p value and the false discovery rate corrected p value.
Identify the differentially expressed genes. The result res edgeR"shows that comparing with the normal tissues, 3, 121 genes are up-regulated, and 1, 578 genes are down-regulated in Cholangiocarcinoma tissues. Output the result table to a file.
Create the volcano plot. Click export to save the volcano plot.Four. Differential expression analysis through DESeq2.
Click Run to install R package DESeq2"Click Run to load R package DESeq2"Run the following R code to determine the groping factor. Create the DESeq2 data set object. Perform analysis.
Generate the result table. The result is saved in res DESeq2, which includes the mean of the normalized read count, log fold change value, log fold change standard arrow, the weld statistic, original P value and the corrected P value. Identify DEGs.
The result res DESeq2 shows that comparing with the normal tissues, two thousand nine hundred and thirty eight genes are up-regulated, and one thousand six hundred and sixteen genes are down-regulated in Cholangiocarcinoma tissues. Output the result table to a file. Create the volcano plot.
Click export to save the volcano plot.Five. Venn diagram. Click Run to install the R package venn diagram.
Click Run to load the R package venn diagram. Make a venn diagram of up-regulated differentially expressed genes. Click export to save the van diagram, Make a venn diagram of down-regulated differentially expressed genes.
Click export to save the venn diagram.Six. Representative results. Figure one shows the volcano plots of all genes acquired by limma, edgeR and DESeq2.
Negative log p value is plotted against the log fold change. Red points represent the up-regulated differentially expressed genes, and the green points represent the down-regulated differentially expressed genes. Limma identifies the one thousand eight hundred and eighty down-regulated differentially expressed the genes, and the one thousand four hundred and forty three up-regulated differentially expressed genes in Cholangiocarcinoma tissues.
EdgeR identify the one thousand five hundred seventy eight down-regulated differentially expressed genes, and three thousand one hundred and twenty one up-regulated differentially expressed genes. DESeq2 identify one thousand six hundred and sixteen down-regulated differentially expressed genes, and two thousand nine hundred and thirty eight up-regulated differentially expressed genes. Figure two, venn diagrams show overlap among the results divide from limma edgeR and DESeq2.
Compare the results of these three methods, One thousand four hundred and thirty one up-regulated differentially expressed genes, and one thousand five hundred and thirty one down-regulated differentially expressed genes are overlapping.Seven.Conclusion. In this protocol, we provided here a detailed protocol of different types of measure analysis for a high sequence of count data by using R packages, limma, edgeR, and DESeq2. Three methods have similar and staffs among their process of their analysis.
And then their from those three medicines are partly overlapping. All three medicines have their own advantages. And the choice is just depends on time of your data.
If there is my current data, limma should be given with priority, but generation sequencing data, in edgeR, and DESeq2 are preferred.