Name: identification of alternative splicing and polyadenylation in rna-seq data
Uploaded: 2021-06-24T13:00:00.000+00:00
Duration: 8 min 35 s
Description: Watch this Scientific Journal Video about Identification of Alternative Splicing and Polyadenylation in RNA-seq Data at JoVE.com

This study was supported by an Australian Research Council (ARC) Future Fellowship (FT16010043) and ANU Futures Scheme.

<ol>
	<li>Katz, Y., Wang, E. T., Airoldi, E. M., Burge, C. B. <a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Search&doptcmdl=Citation&defaultField=Title+Word&term=Analysis+and+design+of+RNA+sequencing+experiments+for+identifying+isoform+regulation.">Analysis and design of RNA sequencing experiments for identifying isoform regulation.</a> Nature Methods. 7 (12), 1009-1015 (2010).</li><li>Wang, Y., et al. <a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Search&doptcmdl=Citation&defaultField=Title+Word&term=Mechanism+of+alternative+splicing+and+its+regulation.">Mechanism of alternative splicing and its regulation.</a> Biomedical Reports. 3 (2), 152-158 (2015).</li><li>Mehmood, A., et al. <a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Search&doptcmdl=Citation&defaultField=Title+Word&term=Systematic+evaluation+of+differential+splicing+tools+for+RNA-seq+studies.">Systematic evaluation of differential splicing tools for RNA-seq studies.</a> Briefings in Bioinformatics. 21 (6), 2052-2065 (2020).</li><li>Movassat, M., et al. <a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Search&doptcmdl=Citation&defaultField=Title+Word&term=Coupling+between+alternative+polyadenylation+and+alternative+splicing+is+limited+to+terminal+introns.">Coupling between alternative polyadenylation and alternative splicing is limited to terminal introns.</a> RNA Biology. 13 (7), 646-655 (2016).</li><li>Tian, B., Manley, J. L. <a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Search&doptcmdl=Citation&defaultField=Title+Word&term=Alternative+polyadenylation+of+mRNA+precursors.">Alternative polyadenylation of mRNA precursors.</a> Nature Reviews Molecular Cell Biology. 18 (1), 18-30 (2017).</li><li>Herrmann, C. J., et al. <a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Search&doptcmdl=Citation&defaultField=Title+Word&term=PolyASite+2.0:+a+consolidated+atlas+of+polyadenylation+sites+from+3'+end+sequencing.">PolyASite 2.0: a consolidated atlas of polyadenylation sites from 3' end sequencing.</a> Nucleic Acids Research. 48 (1), 174-179 (2020).</li><li>Liu, R., Loraine, A. E., Dickerson, J. A. <a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Search&doptcmdl=Citation&defaultField=Title+Word&term=Comparisons+of+computational+methods+for+differential+alternative+splicing+detection+using+RNA-seq+in+plant+systems.">Comparisons of computational methods for differential alternative splicing detection using RNA-seq in plant systems.</a> BMC Bioinformatics. 15 (1), 364 (2014).</li><li>Conesa, A., et al. <a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Search&doptcmdl=Citation&defaultField=Title+Word&term=A+survey+of+best+practices+for+RNA-seq+data+analysis.">A survey of best practices for RNA-seq data analysis.</a> Genome Biology. 17 (1), 13 (2016).</li><li>Anders, S., Reyes, A., Huber, W. <a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Search&doptcmdl=Citation&defaultField=Title+Word&term=Detecting+differential+usage+of+exons+from+RNA-seq+data.">Detecting differential usage of exons from RNA-seq data.</a> Genome Research. 22 (10), 2008-2017 (2012).</li><li>Ritchie, M. E., et al. <a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Search&doptcmdl=Citation&defaultField=Title+Word&term=limma+powers+differential+expression+analyses+for+RNA-sequencing+and+microarray+studies.">limma powers differential expression analyses for RNA-sequencing and microarray studies.</a> Nucleic Acids Research. 43 (7), 47 (2014).</li><li>Shen, S., et al. <a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Search&doptcmdl=Citation&defaultField=Title+Word&term=rMATS:+Robust+and+flexible+detection+of+differential+alternative+splicing+from+replicate+RNA-Seq+data.">rMATS: Robust and flexible detection of differential alternative splicing from replicate RNA-Seq data.</a> Proceedings of the National Academy of Sciences. 111 (51), 5593-5601 (2014).</li><li>Mehmood, A., et al. <a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Search&doptcmdl=Citation&defaultField=Title+Word&term=Systematic+evaluation+of+differential+splicing+tools+for+RNA-seq+studies.">Systematic evaluation of differential splicing tools for RNA-seq studies.</a> Briefings in bioinformatics. 21 (6), 2052-2065 (2020).</li><li>Kanitz, A., et al. <a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Search&doptcmdl=Citation&defaultField=Title+Word&term=Comparative+assessment+of+methods+for+the+computational+inference+of+transcript+isoform+abundance+from+RNA-seq+data.">Comparative assessment of methods for the computational inference of transcript isoform abundance from RNA-seq data.</a> Genome biology. 16 (1), 1-26 (2015).</li><li>Love, M. I., Huber, W., Anders, S. <a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Search&doptcmdl=Citation&defaultField=Title+Word&term=Moderated+estimation+of+fold+change+and+dispersion+for+RNA-seq+data+with+DESeq2.">Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.</a> Genome Biology. 15 (12), 550 (2014).</li><li>Sznajder, L. J., et al. <a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Search&doptcmdl=Citation&defaultField=Title+Word&term=Loss+of+MBNL1+induces+RNA+misprocessing+in+the+thymus+and+peripheral+blood.">Loss of MBNL1 induces RNA misprocessing in the thymus and peripheral blood.</a> Nature Communications. 11, 1-11 (2020).</li><li>Batra, R., et al. <a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Search&doptcmdl=Citation&defaultField=Title+Word&term=Loss+of+MBNL+leads+to+disruption+of+developmentally+regulated+alternative+polyadenylation+in+RNA-mediated+disease.">Loss of MBNL leads to disruption of developmentally regulated alternative polyadenylation in RNA-mediated disease.</a> Molecular Cell. 56 (2), 311-322 (2014).</li><li>Leinonen, R., Sugawara, H., Shumway, M., et al. <a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Search&doptcmdl=Citation&defaultField=Title+Word&term=The+sequence+read+archive.">The sequence read archive.</a> Nucleic acids research. 39, 19-21 (2010).</li><li>Tange, O. <a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Search&doptcmdl=Citation&defaultField=Title+Word&term=.">.</a> GNU parallel-the command-line power tool. 36, 42-47 (2011).</li><li>Martin, M. <a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Search&doptcmdl=Citation&defaultField=Title+Word&term=Cutadapt+removes+adapter+sequences+from+high-throughput+sequencing+reads.">Cutadapt removes adapter sequences from high-throughput sequencing reads.</a> EMBnet journal. 17 (1), 10-12 (2011).</li><li>Bolger, A. M., Lohse, M., Usadel, B. <a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Search&doptcmdl=Citation&defaultField=Title+Word&term=Trimmomatic:+a+flexible+trimmer+for+Illumina+sequence+data.">Trimmomatic: a flexible trimmer for Illumina sequence data.</a> Bioinformatics. 30 (15), 2114-2120 (2014).</li><li>Dobin, A., et al. <a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Search&doptcmdl=Citation&defaultField=Title+Word&term=STAR:+ultrafast+universal+RNA-seq+aligner.">STAR: ultrafast universal RNA-seq aligner.</a> Bioinformatics. 29 (1), 15-21 (2013).</li><li>Robinson, M. D., McCarthy, D. J., Smyth, G. K. <a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Search&doptcmdl=Citation&defaultField=Title+Word&term=edgeR:+a+Bioconductor+package+for+differential+expression+analysis+of+digital+gene+expression+data.">edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.</a> Bioinformatics. 26 (1), 139-140 (2010).</li><li>Robinson, M. D., Oshlack, A. <a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Search&doptcmdl=Citation&defaultField=Title+Word&term=A+scaling+normalization+method+for+differential+expression+analysis+of+RNA-seq+data.">A scaling normalization method for differential expression analysis of RNA-seq data.</a> Genome Biology. 11 (3), 25 (2010).</li><li>Veiga, D. F. T. <a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Search&doptcmdl=Citation&defaultField=Title+Word&term=maser:+Mapping+Alternative+Splicing+Events+to+pRoteins.">maser: Mapping Alternative Splicing Events to pRoteins.</a> R package version 1.4.0. , (2019).</li><li>Langmead, B., Trapnell, C., Pop, M., Salzberg, S. L. <a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Search&doptcmdl=Citation&defaultField=Title+Word&term=Ultrafast+and+memory-efficient+alignment+of+short+DNA+sequences+to+the+human+genome.">Ultrafast and memory-efficient alignment of short DNA sequences to the human genome.</a> Genome Biology. 10 (13), 25 (2009).</li><li>Quinlan, A. R., Hall, I. M. <a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Search&doptcmdl=Citation&defaultField=Title+Word&term=BEDTools:+a+flexible+suite+of+utilities+for+comparing+genomic+features.">BEDTools: a flexible suite of utilities for comparing genomic features.</a> Bioinformatics. 26 (6), 841-842 (2010).</li><li>Ram&#237;rez, F., D&#252;ndar, F., Diehl, S., Gr&#252;ning, B. A., Manke, T. <a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Search&doptcmdl=Citation&defaultField=Title+Word&term=deepTools:+a+flexible+platform+for+exploring+deep-sequencing+data.">deepTools: a flexible platform for exploring deep-sequencing data.</a> Nucleic acids research. 42 (1), 187-191 (2014).</li><li>Merino, G. A., Conesa, A., Fern&#225;ndez, E. A. <a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Search&doptcmdl=Citation&defaultField=Title+Word&term=A+benchmarking+of+workflows+for+detecting+differential+splicing+and+differential+expression+at+isoform+level+in+human+RNA-seq+studies.">A benchmarking of workflows for detecting differential splicing and differential expression at isoform level in human RNA-seq studies.</a> Briefings in bioinformatics. 20 (2), 471-481 (2019).</li><li>Chhangawala, S., Rudy, G., Mason, C. E., Rosenfeld, J. A. <a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Search&doptcmdl=Citation&defaultField=Title+Word&term=The+impact+of+read+length+on+quantification+of+differentially+expressed+genes+and+splice+junction+detection.">The impact of read length on quantification of differentially expressed genes and splice junction detection.</a> Genome biology. 16 (1), 1-10 (2015).</li><li>Conesa, A., et al. <a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Search&doptcmdl=Citation&defaultField=Title+Word&term=A+survey+of+best+practices+for+RNA-seq+data+analysis.">A survey of best practices for RNA-seq data analysis.</a> Genome Biol. 17, 13 (2016).</li><li>Trapnell, C., et al. <a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Search&doptcmdl=Citation&defaultField=Title+Word&term=Differential+gene+and+transcript+expression+analysis+of+RNA-seq+experiments+with+TopHat+and+Cufflinks.">Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks.</a> Nat Protoc. 7 (3), 562-578 (2012).</li><li>Li, B., Dewey, C. N. <a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Search&doptcmdl=Citation&defaultField=Title+Word&term=RSEM:+accurate+transcript+quantification+from+RNA-Seq+data+with+or+without+a+reference+genome.">RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome.</a> BMC Bioinformatics. 12, 323 (2011).</li><li>Bray, N. L., Pimentel, H., Melsted, P., Pachter, L. <a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Search&doptcmdl=Citation&defaultField=Title+Word&term=Near-optimal+probabilistic+RNA-seq+quantification.">Near-optimal probabilistic RNA-seq quantification.</a> Nat Biotechnol. 34 (5), 525-527 (2016).</li></ol>

The authors have nothing to disclose.

In this study, we evaluated exon-based and event-based approaches to detect AS and APA in bulk RNA-Seq and 3&#39; end sequencing data. The exon-based AS approaches produce both a list of differentially expressed exons and a gene-level ranking ordered by the statistical significance of overall gene-level differential splicing activity (Tables 1-2, 4-5). For the diffSplice package, differential usage is determined by fitting weighted linear models at an exon-level to estimate the differential log fold-change of an exon against the average log fold-change of the other exons within the same gene (called per exon FC). The gene-level statistical significance is computed by combining individual exon-level significance tests into a gene-wise test by the Simes method10. This ranking by gene-level differential splicing activity can subsequently be used to perform a gene set enrichment analysis (GSEA) of key pathways involved10. DEXSeq uses a similar strategy, by fitting a generalized linear model to measure differential exon usage, though differing in certain steps such as filtering, normalization and dispersion estimation. On comparing the top 500 ranked exons showing AS activity and APA using DEXSeq and DiffSplice, we found an overlap of 310 exons and 300 pA sites, respectively, demonstrating the concordance of the two exon-based approaches, which was also demonstrated in a previous study&#160;29. It is recommended to use a combination of both an exon-based (either DEXSeq or diffSplice) and an event-based approach for comprehensive detection and classification of AS. For APA, users can choose either DEXSeq or diffSplice: both methods have been shown to perform well across a wide range of transcriptomics experiments29.
In preparing the RNA-seq library for an AS analysis, it is important to use a strand-specific bulk RNA-seq protocol8, as a large fraction of genes in vertebrate genomes overlap on different strands, and a non-strand-specific protocol is unable to distinguish these overlapping regions, confounding final exon detection. Another consideration is read depth, with splicing analyses requiring deeper sequencing than DGE, e.g. 30-60 million reads per sample, versus 5-25 million reads per sample for DGE (https://sapac.support.illumina.com/bulletins/2017/04/considerations-for-rna-seq-read-length-and-coverage-.html). All the tools demonstrated in the protocol support both single-end and paired-end sequencing data. If only known gene annotations are used to detect junction reads then single-ended shorter reads (&#8805; 50 bp) can be used, though de novo detection of novel splice junctions benefits from paired-end and longer (&#8805; 100bp) reads30,31. The choice of RNA extraction protocol - either polyA selection or rRNA depletion-- depends on the quality of RNA and the experimental question- see31 for a discussion. Depending on the details of the library construction, modifications will be required to the example scripts given here for the parameters of read alignment, feature counting and rMATS. In computing the initial exon level read counts using featureCounts, or similar methods, care must be taken to configure the function options correctly for counts and strandedness: in featureCounts, we set the &#34;strandSpecific&#34; argument appropriately for the strand-specific RNA-seq protocol used; and for exon-level quantification it is expected that a read will map over adjacent exons, and so we set the allowMultiOverlap parameter to TRUE. For APA, there are different 3&#39; end sequencing protocols6 which vary in the precise location of peaks relative to the pA site. For our example data we determine the peak is 60 bp upstream of the pA site as shown by Figure 5, and this analysis will need to be adapted for other 3&#39; end sequencing protocols.
In this protocol we limit the scope to the discussion of differential analyses at the level of individual exons, and splicing events consisting of adjacent exon-intron combinations. We do not discuss the class of analyses based on isoform de novo reconstruction such as Cufflinks, Cuffdiff32, RSEM33, Kallisto34 which aim to detect and quantify the absolute and relative expression of entire alternative isoforms. The exon and event-based methods are more sensitive for detecting individual splicing events30 and in many cases provide all the information needed for further analysis, without a need for isoform-level quantification.
The latest version of the source files in this protocol are available at https://github.com/jiayuwen/AS_APA_JoVE

RNA-seq has been widely used over the years typically for estimating differential gene expression and gene discovery1. In addition, it can also be utilized to estimate varying exon level usage due to gene expressing different isoforms, hence contributing to a better understanding of gene regulation at the post-transcriptional level. The majority of eukaryotic genes generate different isoforms by alternative splicing (AS) to increase the diversity of mRNA expression. AS events can be divided into different patterns: skipping of complete exons (SE) where a (&#34;cassette&#34;) exon is completely removed out of the transcript along with its flanking introns; alternative (donor) 5&#39; splice site selection (A5SS) and alternative 3&#39; (acceptor) splice site selection (A3SS) when two or more splice sites are present on either end of an exon; retention of introns (RI) when an intron is retained within the mature mRNA transcript and mutual exclusion of exon usage (MXE) where only one of the two available exons can be retained at a time2,3. Alternative polyadenylation (APA) also plays an important role in regulating gene expression using alternative poly (A) sites to generate multiple mRNA isoforms from a single transcript4. Most polyadenylation sites (pAs) are located in the 3&#39; untranslated region (3&#39; UTRs), generating mRNA isoforms with diverse 3&#39; UTR lengths. As the 3&#39; UTR is the central hub for recognizing regulatory elements, different 3&#39; UTR lengths can affect mRNA localization, stability and translation5. There are a class of 3&#39; end sequencing assays optimized to detect APA that differ in the details of the protocol6. The pipeline described here is designed for PolyA-seq, but can be adapted for other protocols as described.
In this study, we present a pipeline of differential exon analysis methods7,8 (Figure 1), which can be divided into two broad categories: exon-based (DEXSeq9, diffSplice10) and event-based (replicate Multivariate Analysis of Transcript Splicing (rMATS)11). The exon-based methods compare the fold change across conditions of individual exons, against a measure of overall gene fold change to call differentially expressed exon usage, and from that compute a gene-level measure of AS activity. Event-based methods use exon-intron-spanning junction reads to detect and classify specific splicing events such as exon skipping or retention of introns, and distinguish these AS types in the output3. Thus, these methods provide complementary views for a complete analysis of AS12,13. We selected DEXSeq (based on the DESeq214 DGE package) and diffSplice (based on the Limma10 DGE package) for the study as they are amongst the most widely used packages for differential splicing analysis. rMATS was chosen as a popular method for event-based analysis. Another popular event-based method is MISO (Mixture of Isoforms)1. For APA we adapt the exon-based approach.
<img alt="Figure 1" class="xfigimg" src="/files/ftp_upload/62636/62636fig01.jpg" /> 
Figure 1.&#160;Analysis pipeline. Flowchart of the steps used in the analysis. Steps include: obtaining the data, performing quality checks and read alignment followed by counting reads using annotations for known exons, introns and pA sites, filtering to remove low counts and normalization. PolyA-seq data was analysed for alternative pA sites using diffSplice/DEXSeq methods, bulk RNA-Seq was analysed for alternative splicing at the exon level with diffSplice/DEXseq methods, and AS events analysed with rMATS. <a href="https://www.jove.com/files/ftp_upload/62636/62636fig01large.jpg" target="_blank">Please click here to view a larger version of this figure.</a>
The RNA-seq data used in this survey was acquired from Gene Expression Omnibus (GEO) (GSE138691)15. We used mouse RNA-seq data from this study with two condition groups: wild-type (WT) and Muscleblind-like type 1 knockout (Mbnl1 KO) with three replicates each. To demonstrate differential polyadenylation site usage analysis, we obtained mouse embryo fibroblasts (MEFs) PolyA-seq data (GEO Accession GSE60487)16. The data has four condition groups: Wild-type (WT), Muscleblind-like type1/type 2 double knockout (Mbnl1/2 DKO), Mbnl 1/2 DKO with Mbnl3 knockdown (KD) and Mbnl1/2 DKO with Mbnl3 control (Ctrl). Each condition group consists of two replicates.
<table border="1" fo:font-size="5px" fo:keep-together.within-page="1" fo:keep-with-next.within-page="always">
	<tbody>
		<tr>
			<td></td>
			<td>GEO Accession</td>
			<td>SRA Run number</td>
			<td>Sample name</td>
			<td>Condition</td>
			<td>Replicate</td>
			<td>Tissue</td>
			<td>Sequencing</td>
			<td>Read length</td>
		</tr>
		<tr>
			<td rowspan="6">RNA-Seq</td>
			<td>GSM4116218</td>
			<td>SRR10261601</td>
			<td>Mbnl1KO_Thymus_1</td>
			<td>Mbnl1 knockout</td>
			<td>Rep 1</td>
			<td>Thymus</td>
			<td>Paired-end</td>
			<td>100 bp</td>
		</tr>
		<tr>
			<td>GSM4116219</td>
			<td>SRR10261602</td>
			<td>Mbnl1KO_Thymus_2</td>
			<td>Mbnl1 knockout</td>
			<td>Rep 2</td>
			<td>Thymus</td>
			<td>Paired-end</td>
			<td>100 bp</td>
		</tr>
		<tr>
			<td>GSM4116220</td>
			<td>SRR10261603</td>
			<td>Mbnl1KO_Thymus_3</td>
			<td>Mbnl1 knockout</td>
			<td>Rep 3</td>
			<td>Thymus</td>
			<td>Paired-end</td>
			<td>100 bp</td>
		</tr>
		<tr>
			<td>GSM4116221</td>
			<td>SRR10261604</td>
			<td>WT_Thymus_1</td>
			<td>Wild type</td>
			<td>Rep 1</td>
			<td>Thymus</td>
			<td>Paired-end</td>
			<td>100 bp</td>
		</tr>
		<tr>
			<td>GSM4116222</td>
			<td>SRR10261605</td>
			<td>WT_Thymus_2</td>
			<td>Wild type</td>
			<td>Rep 2</td>
			<td>Thymus</td>
			<td>Paired-end</td>
			<td>100 bp</td>
		</tr>
		<tr>
			<td>GSM4116223</td>
			<td>SRR10261606</td>
			<td>WT_Thymus_3</td>
			<td>Wild type</td>
			<td>Rep 3</td>
			<td>Thymus</td>
			<td>Paired-end</td>
			<td>100 bp</td>
		</tr>
		<tr>
			<td rowspan="8">3P-Seq</td>
			<td>GSM1480973</td>
			<td>SRR1553129</td>
			<td>WT_1</td>
			<td>Wild type (WT)</td>
			<td>Rep 1</td>
			<td>Mouse embryonic Fibroblasts (MEFs)</td>
			<td>Single-end</td>
			<td>40 bp</td>
		</tr>
		<tr>
			<td>GSM1480974</td>
			<td>SRR1553130</td>
			<td>WT_2</td>
			<td>Wild type (WT)</td>
			<td>Rep 2</td>
			<td>Mouse embryonic Fibroblasts (MEFs)</td>
			<td>Single-end</td>
			<td>40 bp</td>
		</tr>
		<tr>
			<td>GSM1480975</td>
			<td>SRR1553131</td>
			<td>DKO_1</td>
			<td>Mbnl 1/2 double knockout (DKO)</td>
			<td>Rep 1</td>
			<td>Mouse embryonic Fibroblasts (MEFs)</td>
			<td>Single-end</td>
			<td>40 bp</td>
		</tr>
		<tr>
			<td>GSM1480976</td>
			<td>SRR1553132</td>
			<td>DKO_2</td>
			<td>Mbnl 1/2 double knockout (DKO)</td>
			<td>Rep 2</td>
			<td>Mouse embryonic Fibroblasts (MEFs)</td>
			<td>Single-end</td>
			<td>40 bp</td>
		</tr>
		<tr>
			<td>GSM1480977</td>
			<td>SRR1553133</td>
			<td>DKOsiRNA_1</td>
			<td>Mbnl 1/2 double knockout with Mbnl 3 siRNA (KD)</td>
			<td>Rep 1</td>
			<td>Mouse embryonic Fibroblasts (MEFs)</td>
			<td>Single-end</td>
			<td>40 bp</td>
		</tr>
		<tr>
			<td>GSM1480978</td>
			<td>SRR1553134</td>
			<td>DKOsiRNA_2</td>
			<td>Mbnl 1/2 double knockout with Mbnl 3 siRNA (KD)</td>
			<td>Rep 2</td>
			<td>Mouse embryonic Fibroblasts (MEFs)</td>
			<td>Single-end</td>
			<td>36 bp</td>
		</tr>
		<tr>
			<td>GSM1480979</td>
			<td>SRR1553135</td>
			<td>DKONTsiRNA_1</td>
			<td>Mbnl 1/2 double knockout with non-targeting siRNA (Ctrl)</td>
			<td>Rep 1</td>
			<td>Mouse embryonic Fibroblasts (MEFs)</td>
			<td>Single-end</td>
			<td>40 bp</td>
		</tr>
		<tr>
			<td>GSM1480980</td>
			<td>SRR1553136</td>
			<td>DKONTsiRNA_2</td>
			<td>Mbnl 1/2 double knockout with non-targeting siRNA (Ctrl)</td>
			<td>Rep 2</td>
			<td>Mouse embryonic Fibroblasts (MEFs)</td>
			<td>Single-end</td>
			<td>40 bp</td>
		</tr>
	</tbody>
</table>
Table 1. Summary of RNA-Seq and PolyA-seq datasets used for the analysis.

<table><tbody><tr><td>Not relevent for computational study</td><td></td><td></td><td></td></tr></tbody></table>

identification of alternative splicing and polyadenylation in rna-seq data

As well as the typical analysis of RNA-Seq to measure differential gene expression (DGE) across experimental/biological conditions, RNA-seq data can also be utilized to explore other complex regulatory mechanisms at the exon level. Alternative splicing and polyadenylation play a crucial role in the functional diversity of a gene by generating different isoforms to regulate gene expression at the post-transcriptional level, and limiting analyses to the whole gene level can miss this important regulatory layer. Here, we demonstrate detailed step by step analyses for identification and visualization of differential exon and polyadenylation site usage across conditions, using Bioconductor and other packages and functions, including DEXSeq, diffSplice from the Limma package, and rMATS.

Watch this Scientific Journal Video about Identification of Alternative Splicing and Polyadenylation in RNA-seq Data at JoVE.com

Identification of Alternative Splicing and Polyadenylation in RNA-seq Data

Alternative splicing (AS) and alternative polyadenylation (APA) expand the diversity of transcript isoforms and their products. Here, we describe bioinformatic protocols to analyze bulk RNA-seq and 3' end sequencing assays to detect and visualize AS and APA varying across experimental conditions.

As well as the typical analysis of RNA-Seq to measure differential gene expression (DGE) across experimental/biological conditions, RNA-seq data can also ...

identification-alternative-splicing-polyadenylation-rna-seq

Nanyang Technological University

Research

JoVE Journal

Biology

5.4K Views.  The Australian National University. Alternative splicing (AS) and alternative polyadenylation (APA) expand the diversity of transcript isoforms and their products. Here, we describe bioinformatic protocols to analyze bulk RNA-seq and 3' end sequencing assays to detect and visualize AS and APA varying across experimental conditions.

Video: Identification of Alternative Splicing and Polyadenylation in RNA-seq Data

Merging Absolute and Relative Quantitative PCR Data to Quantify STAT3 Splice Variant Transcripts

Human signal transducer and activator of transcription 3 (STAT3) is one of many genes containing a tandem splicing site. Alternative donor splice sites 3 nucleotides apart result in either the inclusion (S) or exclusion (&#916;S) of a single residue, Serine-701. Further downstream, splicing at a pair of alternative acceptor splice sites result in transcripts encoding either the 55 terminal residues of the transactivation domain (&#945;) or a truncated transactivation domain with 7 unique residues (&#946;). As outlined in this manuscript, measuring the proportions of STAT3's four spliced transcripts (S&#945;, S&#946;, &#916;S&#945; and &#916;S&#946;) was possible using absolute qPCR (quantitative polymerase chain reaction). The protocol therefore distinguishes and measures highly similar splice variants. Absolute qPCR makes use of calibrator plasmids and thus specificity of detection is not compromised for the sake of efficiency. The protocol necessitates primer validation and optimization of cycling parameters. A combination of absolute qPCR and efficiency-dependent relative qPCR of total STAT3 transcripts allowed a description of the fluctuations of STAT3 splice variants' levels in eosinophils treated with cytokines. The protocol also provided evidence of a co-splicing interdependence between the two STAT3 splicing events. The strategy based on a combination of the two qPCR techniques should be readily adaptable to investigation of co-splicing at other tandem splicing sites.

Tandem splicing events occur at sites less than 12 nucleotides apart. Quantifying ratios of such splice variants is feasible using an absolute quantitative PCR approach. This manuscript describes how splice variants of the gene STAT3, in which two splicing events results in Serine-701 inclusion/exclusion and &#945;/&#946; C-termini, can be quantified.

Human signal transducer and activator of transcription 3 (STAT3) is one of many genes containing a tandem splicing site. Alternative donor splice sites 3 ...

Genetics

Using RNA-sequencing to Detect Novel Splice Variants Related to Drug Resistance in In Vitro Cancer Models

Drug resistance remains a major problem in the treatment of cancer for both hematological malignancies and solid tumors. Intrinsic or acquired resistance can be caused by a range of mechanisms, including increased drug elimination, decreased drug uptake, drug inactivation and alterations of drug targets. Recent data showed that other than by well-known genetic (mutation, amplification) and epigenetic (DNA hypermethylation, histone post-translational modification) modifications, drug resistance mechanisms might also be regulated by splicing aberrations. This is a rapidly growing field of investigation that deserves future attention in order to plan more effective therapeutic approaches. The protocol described in this paper is aimed at investigating the impact of aberrant splicing on drug resistance in solid tumors and hematological malignancies. To this goal, we analyzed the transcriptomic profiles of several in vitro models through RNA-seq and established a qRT-PCR based method to validate candidate genes. In particular, we evaluated the differential splicing of DDX5 and PKM transcripts. The aberrant splicing detected by the computational tool MATS was validated in leukemic cells, showing that different DDX5 splice variants are expressed in the parental vs. resistant cells. In these cells, we also observed a higher PKM2/PKM1 ratio, which was not detected in the Panc-1 gemcitabine-resistant counterpart compared to parental Panc-1 cells, suggesting a different mechanism of drug-resistance induced by gemcitabine exposure.

Using RNA-sequencing to Detect Novel Splice Variants Related to Drug Resistance in In Vitro Cancer Models

Here we describe a protocol aimed at investigating the impact of aberrant splicing on drug resistance in solid tumors and hematological malignancies. To this goal, we analyzed the transcriptomic profiles of parental and resistant in vitro models through RNA-seq and established a qRT-PCR based method to validate candidate genes.

Drug resistance remains a major problem in the treatment of cancer for both hematological malignancies and solid tumors. Intrinsic or acquired resistance ...

Cancer Research

Engineering Artificial Factors to Specifically Manipulate Alternative Splicing in Human Cells

The processing of most eukaryotic RNAs is mediated by RNA Binding Proteins (RBPs) with modular configurations, including an RNA recognition module, which specifically binds the pre-mRNA target and an effector domain. Previously, we have taken advantage of the unique RNA binding mode of the PUF domain in human Pumilio 1 to generate a programmable RNA binding scaffold, which was used to engineer various artificial RBPs to manipulate RNA metabolism. Here, a detailed protocol is described to construct Engineered Splicing Factors (ESFs) that are specifically designed to modulate the alternative splicing of target genes. The protocol includes how to design and construct a customized PUF scaffold for a specific RNA target, how to construct an ESF expression plasmid by fusing a designer PUF domain and an effector domain, and how to use ESFs to manipulate the splicing of target genes. In the representative results of this method, we have also described the common assays of ESF activities using splicing reporters, the application of ESF in cultured human cells, and the subsequent effect of splicing changes. By following the detailed protocols in this report, it is possible to design and generate ESFs for the regulation of different types of Alternative Splicing (AS), providing a new strategy to study splicing regulation and the function of different splicing isoforms. Moreover, by fusing different functional domains with a designed PUF domain, researchers can engineer artificial factors that target specific RNAs to manipulate various steps of RNA processing.

This report describes a bioengineering method to design and construct novel Artificial Splicing Factors (ASFs) that specifically modulate the splicing of target genes in mammalian cells. This method can be further expanded to engineer various artificial factors to manipulate other aspects of RNA metabolism.

The processing of most eukaryotic RNAs is mediated by RNA Binding Proteins (RBPs) with modular configurations, including an RNA recognition module, which ...

Quantitative Analysis of Alternative Pre-mRNA Splicing in Mouse Brain Sections Using RNA In Situ Hybridization Assay

Alternative splicing (AS) occurs in more than 90% of human genes. The expression pattern of an alternatively spliced exon is often regulated in a cell type-specific fashion. AS expression patterns are typically analyzed by RT-PCR and RNA-seq using RNA samples isolated from a population of cells. In situ examination of AS expression patterns for a particular biological structure can be carried out by RNA in situ hybridization (ISH) using exon-specific probes. However, this particular use of ISH has been limited because alternative exons are generally too short to design exon-specific probes. In this report, the use of BaseScope, a recently developed technology that employs short antisense oligonucleotides in RNA ISH, is described to analyze AS expression patterns in mouse brain sections. Exon 23a of neurofibromatosis type 1 (Nf1) is used as an example to illustrate that short exon-exon junction probes exhibit robust hybridization signals with high specificity in RNA ISH analysis on mouse brain sections. More importantly, signals detected with exon inclusion- and skipping-specific probes can be used to reliably calculate the percent spliced in values of Nf1 exon 23a expression in different anatomical areas of a mouse brain. The experimental protocol and calculation method for AS analysis are presented. The results indicate that BaseScope provides a powerful new tool to assess AS expression patterns in situ.

Quantitative Analysis of Alternative Pre-mRNA Splicing in Mouse Brain Sections Using RNA In Situ Hybridization Assay

An in situ hybridization (ISH) protocol that uses short antisense oligonucleotides to detect alternative pre-mRNA splicing patterns in mouse brain sections is described.

Alternative splicing (AS) occurs in more than 90% of human genes. The expression pattern of an alternatively spliced exon is often regulated in a cell ...

Using the E1A Minigene Tool to Study mRNA Splicing Changes

mRNA processing involves multiple simultaneous steps to prepare mRNA for translation, such as 5&#180;capping, poly-A addition and splicing. Besides constitutive splicing, alternative mRNA splicing allows the expression of multifunctional proteins from one gene. As interactome studies are generally the first analysis for new or unknown proteins, the association of the bait protein with splicing factors is an indication that it can participate in mRNA splicing process, but to determine in what context or what genes are regulated is an empirical process. A good starting point to evaluate this function is using the classical minigene tool. Here we present the adenoviral E1A minigene usage for evaluating the alternative splicing changes after different cellular stress stimuli. We evaluated the splicing of E1A minigene in HEK293 stably overexpressing Nek4 protein after different stressing treatments. The protocol includes E1A minigene transfection, cell treatment, RNA extraction and cDNA synthesis, followed by PCR and gel analysis and quantification of the E1A spliced variants. The use of this simple and well-established method combined with specific treatments is a reliable starting point to shed light on cellular processes or what genes can be regulated by mRNA splicing.

This protocol presents a rapid and useful tool for evaluating the role of a protein with uncharacterized function in alternative splicing regulation after chemotherapeutic treatment.

mRNA processing involves multiple simultaneous steps to prepare mRNA for translation, such as 5&#180;capping, poly-A addition and splicing. Besides ...

A Reporter Based Cellular Assay for Monitoring Splicing Efficiency

During gene expression, the vital step of pre-mRNA splicing involves accurate recognition of splice sites and efficient assembly of spliceosomal complexes to join exons and remove introns prior to cytoplasmic export of the mature mRNA. Splicing efficiency can be altered by the presence of mutations at splice sites, the influence of trans-acting splicing factors, or the activity of therapeutics. Here, we describe the protocol for a cellular assay that can be applied for monitoring the splicing efficiency of any given exon. The assay uses an adaptable plasmid encoded 3-exon/2-intron minigene reporter, which can be expressed in mammalian cells by transient transfection. Post-transfection, total cellular RNA is isolated, and the efficiency of exon splicing in the reporter mRNA is determined by either primer extension or semi-quantitative reverse transcriptase-polymerase chain reaction (RT-PCR). We describe how the impact of disease associated 5&#8242; splice-site mutations can be determined by introducing them in the reporter; and how the suppression of these mutations can be achieved by co-transfection with U1 small nuclear RNA (snRNA) construct carrying compensatory mutations in its 5&#8242; region that basepairs with the 5&#8242;-splice sites at exon-intron junctions in pre-mRNAs. Thus, the reporter can be used for the design of therapeutic U1 particles to improve recognition of mutant 5&#8242; splice-sites. Insertion of cis-acting&#160;regulatory sites, such as splicing enhancer or silencer sequences, into the reporter can also be used to examine the role of U1 snRNP in regulation mediated by a specific alternative splicing factor. Finally, reporter expressing cells can be incubated with small molecules to determine the effect of potential therapeutics on constitutive pre-mRNA splicing or on exons carrying mutant 5&#8242; splice sites. Overall, the reporter assay can be applied to monitor splicing efficiency in a variety of conditions to study fundamental splicing mechanisms and splicing-associated diseases.

This protocol describes a minigene reporter assay to monitor the impact of 5&#180;-splice site mutations on splicing and develops suppressor U1 snRNA for the rescue of mutation-induced splicing inhibition. The reporter and suppressor U1 snRNA constructs are expressed in HeLa cells, and splicing is analyzed by primer extension or RT-PCR.

During gene expression, the vital step of pre-mRNA splicing involves accurate recognition of splice sites and efficient assembly of spliceosomal complexes ...

Biochemistry

Detection of Alternative Splicing During Epithelial-Mesenchymal Transition

Alternative splicing plays a critical role in the epithelial-mesenchymal transition (EMT), an essential cellular program that occurs in various physiological and pathological processes. Here we describe a strategy to detect alternative splicing during EMT using an inducible EMT model by expressing the transcription repressor Twist. EMT is monitored by changes in cell morphology, loss of E-cadherin localization at cell-cell junctions, and the switched expression of EMT markers, such as loss of epithelial markers E-cadherin and &#947;-catenin and gain of mesenchymal markers N-cadherin and vimentin. Using isoform-specific primer sets, the alternative splicing of interested mRNAs are analyzed by quantitative RT-PCR. The production of corresponding protein isoforms is validated by immunoblotting assays. The method of detecting splice isoforms described here is also suitable for the study of alternative splicing in other biological processes.

Alternative splicing regulation has been shown to contribute to the epithelial-mesenchymal transition (EMT), an essential cellular program in various physiological and pathological processes. Here we describe a method utilizing an inducible EMT model for the detection of alternative splicing during EMT.

Alternative splicing plays a critical role in the epithelial-mesenchymal transition (EMT), an essential cellular program that occurs in various ...

Identification of Key Factors Regulating Self-renewal and Differentiation in EML Hematopoietic Precursor Cells by RNA-sequencing Analysis

Hematopoietic stem cells (HSCs) are used clinically for transplantation treatment to rebuild a patient&#39;s hematopoietic system in many diseases such as leukemia and lymphoma. Elucidating the mechanisms controlling HSCs self-renewal and differentiation is important for application of HSCs for research and clinical uses. However, it is not possible to obtain large quantity of HSCs due to their inability to proliferate in vitro. To overcome this hurdle, we used a mouse bone marrow derived cell line, the EML (Erythroid, Myeloid, and Lymphocytic) cell line, as a model system for this study.
RNA-sequencing (RNA-Seq) has been increasingly used to replace microarray for gene expression studies. We report here a detailed method of using RNA-Seq technology to investigate the potential key factors in regulation of EML cell self-renewal and differentiation. The protocol provided in this paper is divided into three parts. The first part explains how to culture EML cells and separate Lin-CD34+ and Lin-CD34- cells. The second part of the protocol offers detailed procedures for total RNA preparation and the subsequent library construction for high-throughput sequencing. The last part describes the method for RNA-Seq data analysis and explains how to use the data to identify differentially expressed transcription factors between Lin-CD34+ and Lin-CD34- cells. The most significantly differentially expressed transcription factors were identified to be the potential key regulators controlling EML cell self-renewal and differentiation. In the discussion section of this paper, we highlight the key steps for successful performance of this experiment.
In summary, this paper offers a method of using RNA-Seq technology to identify potential regulators of self-renewal and differentiation in EML cells. The key factors identified are subjected to downstream functional analysis in vitro and in vivo.

RNA-sequencing and bioinformatics analyses were used to identify significantly and differentially expressed transcription factors in Lin-CD34+ and Lin-CD34- subpopulations of mouse EMLcells. These transcription factors might play important roles in determining the switch between self-renewing Lin-CD34+ and partially differentiated Lin-CD34- cells.

Hematopoietic stem cells (HSCs) are used clinically for transplantation treatment to rebuild a patient&#39;s hematopoietic system in many diseases such as ...

3' End Sequencing Library Preparation with A-seq2

Studies in the last decade have revealed a complex and dynamic variety of pre-mRNA cleavage and polyadenylation reactions. mRNAs with long 3&#39; untranslated regions (UTRs) are generated in differentiated cells whereas proliferating cells preferentially express transcripts with short 3&#39;UTRs. We describe the A-seq protocol, now at its second version, which was developed to map polyadenylation sites genome-wide and study the regulation of pre-mRNA 3&#39; end processing. Also this current protocol takes advantage of the polyadenylate (poly(A)) tails that are added during the biogenesis of most mammalian mRNAs to enrich for fully processed mRNAs. A DNA adaptor with deoxyuracil at its fourth position allows the precise processing of mRNA 3&#39; end fragments for sequencing. Not including the cell culture and the overnight ligations, the protocol requires about 8 h hands-on time. Along with it, an easy-to-use software package for the analysis of the derived sequencing data is provided. A-seq2 and the associated analysis software provide an efficient and reliable solution to the mapping of pre-mRNA 3&#39; ends in a wide range of conditions, from 106 or fewer cells.

This protocol describes a method for mapping pre-mRNA 3&#39; end processing sites.

Studies in the last decade have revealed a complex and dynamic variety of pre-mRNA cleavage and polyadenylation reactions. mRNAs with long 3&#39; ...

RNA Splicing

Splicing is the process by which eukaryotic RNA is edited before its translation into protein. The RNA strand transcribed from eukaryotic DNA is called the primary transcript. The primary transcripts that become mRNAs are called precursor messenger RNAs (pre-mRNAs). Eukaryotic pre-mRNA contains alternating sequences of exons and introns. Exons are nucleotide sequences that code for proteins, whereas introns are the non-coding regions. In RNA splicing, introns are removed and exons are bonded together.
Splicing is Mediated by the Spliceosome
Splicing occurs in the nucleosome and is mediated by a complex of proteins and small RNAs called small nuclear ribonucleoproteins (snRNPs). snRNPs, together with other proteins, form the spliceosome, which recognizes specific nucleotide sequences at the ends of the exon and intron. First, it binds to a GU-containing sequence at the 5&#39; end of the intron and to a branch point sequence containing an A towards the 3&#39; end of the intron. In a number of carefully-orchestrated steps, other snRNPs then bring the branch point close to the 5&#39; splice site. Subsequently, a chemical reaction cleaves the 5&#39; end of the intron from its upstream exon and attaches it to the branch point, forming a loop called a lariat. To release the lariat, the AG-containing sequence of the intron near the 5&#39; end of the downstream exon reacts with the 3&#39; end of the upstream exon. This reaction patches the two exons together, concluding the splicing process.
Splicing Allows the Expression of Several Proteins from a Single Gene
Typically, exons are joined together in the order in which they appear in a gene. However, during alternative splicing, different combinations of exons in pre-mRNA are combined to form mature mRNA. This produces several different proteins from a single pre-mRNA transcript.
Different patterns of alternative splicing include exon skipping, alternative 5&#39; or 3&#39; splice sites, and intron retention. These patterns are guided by the length of exons or introns and the strength of the splicing signal at the splice sites. Because of this, exons that are shorter than other exons may be overlooked by the spliceosome and omitted from the mature mRNA. In contrast, introns that are significantly shorter than other introns may evade removal by the spliceosome and are retained in the mature mRNA. As a result, alternative splicing generates variants of mature mRNA that were copied from the same stretch of DNA. The RNA sequence variants produce different proteins with additional or fewer amino acids, shifts in the reading frame, or a premature stop codon. These protein isoforms have different biological properties, including function, cellular localization, and interaction with other proteins, thereby playing a vital role in tissue- and environment-specific gene expression.
Abnormal Splicing Can Cause Diseases
Errors in splicing can produce aberrant protein isoforms, which may contribute to diseases, including cancer. For instance, alternative splicing of the BCL2L1 gene generates a long and short protein isoform&#8212;BCL-XL and BCL-XS, respectively&#8212; through the use of alternative 5&#39; splice sites. The longer BCL-XL isoform promotes cell survival and is highly expressed in several types of cancers (e.g., blood, breast, and liver cancers). Expression of the short BCL-XS isoform that promotes cell death is suppressed in cancer.

RNA Splicing, Lariat and the Spliceososme

Splicing is the process by which eukaryotic RNA is edited before its translation into protein. The RNA strand transcribed from eukaryotic DNA is called ...

Identification of Alternative Splicing and Polyadenylation in RNA-seq Data

Podsumowanie

Przeglądaj więcej filmów

Rozdziały w tym wideo

Merging Absolute and Relative Quantitative PCR Data to Quantify STAT3 Splice Variant Transcripts

Using RNA-sequencing to Detect Novel Splice Variants Related to Drug Resistance in In Vitro Cancer Models

Engineering Artificial Factors to Specifically Manipulate Alternative Splicing in Human Cells

Quantitative Analysis of Alternative Pre-mRNA Splicing in Mouse Brain Sections Using RNA In Situ Hybridization Assay

Using the E1A Minigene Tool to Study mRNA Splicing Changes

A Reporter Based Cellular Assay for Monitoring Splicing Efficiency

Detection of Alternative Splicing During Epithelial-Mesenchymal Transition

Identification of Key Factors Regulating Self-renewal and Differentiation in EML Hematopoietic Precursor Cells by RNA-sequencing Analysis

3' End Sequencing Library Preparation with A-seq2

RNA Splicing