9.5K Views
•
11:04 min
•
May 19th, 2019
DOI :
May 19th, 2019
•0:04
Title
0:52
Read Alignment Pipeline to Identify Expressed L1s
2:48
Manual Curation
7:48
Assess Mappability of Each L1 Loci to Factor in a Transcription Level Correction Score
8:42
Results: Identification of Full-length L1 Retroelements in the Human Prostate Tumor Cell Line, DU145
10:01
Conclusion
副本
Mobile elements are one of the major sources of human genetic instability. Understanding their expression in different tissues and conditions is critical to understanding their impact on the genome. The vast of L1 transcripts are the result of passive inclusion of L1-related sequences in other transcripts that have no role in the L1 life cycle.
Our approach eliminates this irrelevant background. This protocol can be adapted to studies of any mobile element, or even viruses in any sequence genome. There needs to be at least some sequence variation to allow discrimination between loci.
Visual demonstration of this method is critical in illustrating the stringency and care required to confidently identify expressed L1 repetitive elements at the locus-specific level. Begin this procedure with cytoplasmic RNA extraction and next generation sequencing as described in the text protocol. By selecting for cytoplasmic RNA, L1-related reads found within expressed intronic mRNA in the nucleus are significantly depleted.
In the sequencing library preparation another step taken to reduce transcriptional noise unrelated to L1s includes the selection of polyadenylated transcripts. This removes L1-related transcript noise found in non-mRNA species. Run alignment paradigm sequencing FASTQ files with the RNA seq sample of interest using bowtie1 by typing the command line in the Linux terminal.
This alignment strategy requires the transcripts be uniquely and collinearly aligned with an exhaustive genomic search. This strategy provides confidence in the calling of reads mapping specifically to a single L1 locus. Strand separate the output BAM files Using SAMtools and Linux commands to select for the top strand and the bottom strand.
Note that the actual flag values may vary if one is not using standard next generation sequencing protocols. This strand separation step works to filter out the transcriptional noise generated within L1 sequences that are unrelated to L1 retrotransposition by eliminating potential antisense L1-related mapped reads. Generate read counts against annotations for L1 loci using bedtools.
First type the command line to generate read counts for L1s in the sense direction on the top strand and then type the command line to generate read count for L1s in the sense direction on the bottom strand. The annotations used to identify L1s denote full length L1s with functional promoter regions which work to eliminate background noise that otherwise originate from truncated L1s. Create a spreadsheet for reads mapped to each annotated L1 locus.
Copy over the generated read counts text file that was created for the bottom strand and label the page as minus_bottom. Sort all columns based on highest to lowest number of reads found in column J.Copy over the generated read counts text file that was created for the top strand. Sort all columns based on highest to lowest number of reads found in column J.And label the page as top_plus.
Create a third page labeled as combined and add all loci with 10 or more reads from minus_bottom and plus_top pages. Sort all columns based on highest to lowest number of reads found in column J.To assist the mappability of genomic regions, specifically in or near L1 loci, whole genome paired and sequencing files of the species of interest were downloaded from NCBI and converted to FASTQ files as described in the text protocol. Now, index the BAM files to make them viewable in the Integrative Genomics Viewer, abbreviated IGV, before loading the files.
In IGV load the reference genome of interest to visualize annotated genes. Also load the annotation file for full length L1 elements to visualize the L1 annotation, the BAM file for human RNA expression, to visualize mapped transcripts from the sample of interest and the BAM file for human genome mappability to assess mappability of genomic regions. Remove coverage and junction rows associated with each BAM file.
Compress the BAM files for human RNA expression and for human genome mappability so all the IGV tracks fit on one screen. The last critical step in eliminating transcriptional noise of L1 sequences unrelated to L1 retrotransposition is the manual creation of full-length L1s identified to have mapped RNA seek transcripts. The manual curation involves the visualization of each expressed L1 locus in the context of its surrounding genomic environment to confirm that expression originates from the L1 promoter.
Using coordinates from L1 loci listed on the spreadsheet combined page, manually curate each L1 locus with uniquely mapped transcripts by examining their surrounding genomic environment in IGV. Curate a locus to be authentically expressed off its own if there are no reads upstream in the L1 direction up to five kilobases. Label the row green in color and note why it is an authentically expressed L1.An exception to this rule exists if the region upstream of the L1 is not mappable.
If this is the case, label the row red in color and note that the expression of the region upstream of the L1 promoter cannot be evaluated and therefore the L1's expression is not able to be confidently determined. Curate a locus to not be authentically expressed off its own promoter if there are reads upstream up to five kilobases. Label the row red in color and note why it is not an authentically expressed L1.Curate a locus as false if it is expressed within an intron of an expressed gene in the same direction, with reads upstream of the L1, if it is downstream of an expressed gene in the same direction with reads upstream of the L1, or for unannotated expression patterns with reads upstream of the L1.An exception to this rule applies when there are minimal reads directly overlapping the L1 promoter start site, but slightly upstream of the L1.If there are no other reads upstream of an L1 case like this, consider this L1 to be authentically expressed.
Label the row green and note why it is an authentically expressed L1.Curate an L1 locus as likely to be false if the pattern of mapped reads to the locus do not correlate with the specific L1's regions of mappability. If an L1 is highly mappable, but only has a pile up of reads in a condensed region within the L1, it is less likely to be related to L1 expression off its own promoter and more likely to be from unannotated sources like exons or LTRs. In cases like this, curate the loci as orange and note why the locus is suspicious.
Verify sources of suspicious pile-ups by checking the L1 location in the UCSC Genome Browser. Curate a locus to not be authentically expressed if it is within a genomic environment of sporadically expressed unannotated regions. Reads may be expressed 10 kilobases upstream of the L1.But every 10 kilobases or so, there are mapped reads and some of those reads align with the L1.These L1s are likely to have mapped reads due to unannotated patterns of genomic expression.
In cases like this, curate the loci as red and note why the locus is suspicious. To assist mappability of each L1 loci determine the number of uniquely mapped reads to L1 loci using the bedtools program, the FL-L1 annotation and the aligned genomic sequence data. Designate an L1 locus to have full coverage mappability when 400 unique reads are aligned to it.
Determine the factor required to scale up or down genomic DNA aligned reads to 400 for each individual L1.To have a scaled measure of expression according to individual L1 locus mappability, multiply the factor by the number of RNA transcript reads that align to individual authentically expressed L1s. Each step is used to highlight differences between L1 elements expressed off their own promoter, and all of the ways that L1 elements may be included into other transcripts that are unrelated to the L1 life cycle. Shown here are transcript reads that map uniquely to all full length intact L1s in the human genome expressed in the DU145 prostate tumor cell line.
In black are the specific loci identified as authentically expressed after manual curation. And in red are the specific loci rejected as authentically expressed reads after manual curation. In gray are loci with less than 10 reads mapping to each.
As these loci represent a small fraction of transcript reads, they were not manually curated. Approximately 4500 loci are not graphically shown, as they had zero mapped reads. After manual curation, the number of reads that map uniquely to authentically expressed specific L1 loci in DU145 range from 175 reads to an arbitrarily chosen minimum cut off of 10 reads.
Once reads were adjusted for mappability scores in each locus, the quantitation for expression for most loci increased. The number of reads that mapped uniquely to authentically expressed specific L1 loci with mappability corrections in DU145 ranged from 612 to four reads and there was a reordering of highest to lowest expressing loci. Each step plays a crucial role in reducing the high level of transcriptional background noise.
However, the most critical step is the manual curation of each L1 locus to confirm transcription of it's own promoter. Approximately 50%of L1 loci identified bioinformatically in DU145 cells were rejected as L1 background noise originating from other transcriptional sources, emphasizing the rigor required to produce reliable results. To identify the youngest of L1s, we suggest using five-prime RACE selection of L1 transcripts and sequencing technology like PacBio that make use of longer reads and permits more unique mapping.
With this approach, we can stringently and confidently identify and quantify L1 expression patterns. This paves the way toward better understanding the regulation of individual L1 loci and the potential impact.
Here, we present a bioinformatic approach and analyses to identify LINE-1 expression at the locus specific level.
关于 JoVE
版权所属 © 2024 MyJoVE 公司版权所有,本公司不涉及任何医疗业务和医疗服务。