Interpretation of the sequencing data generated by the ribosome profiling experiment is critical for quantitatively measuring the translational activities of ribosomes on mRNA, and for studying the mechanisms of translational regulation. In this protocol, we will describe the computational procedure for utilizing the ribosome profiling data and RiboCode, a command line tool to decode mRNA translation at the genome-wide scale and single nucleotide resolution. This method allows to search for the novel peptides arising from the genomic regions outside of the annotated protein coding genes, and offers an opportunity to quantify the rate of mRNA translation.
To begin, open a Linux terminal window, and create a conda environment by executing the command. Switch to the created environment, and install RiboCode and dependencies by executing the command. To get the genome reference files for the reference sequence, go to the Ensembl website, then click download, followed by FTP download.
Click the FASTA option in the column DNA FASTA, and select the row where species is human, shown in the table on the website page. On the Ensembl website page, copy the link as mentioned in the text, then download and unzip the files in the terminal by executing the command. For reference annotation, right click GTF in the column gene sets in the last open webpage.
Copy the link, and download it using the command. To get rRNA sequences, open UCSC genome browser, then click tools, and select table browser in the dropdown list. On the UCSC genome browser page, specify mammal for clade, human for genome, all tables for group, R mask for table, and genome for region.
For filter, click create to go to a new page, and set rep class as does match rRNA. Click submit, and then set the output format to sequence, and output file name as HG38_rRNA. FA.Finally, click get output, and then select get sequence to retrieve the rRNA sequence.
To get ribosome profiling datasets from sequence read archive, download the replicate samples of the si-eIFe treatment group, and rename them by executing the command. Then download the replicate samples of the control group, and rename them by executing the command. To remove rRNA contamination, start indexing rRNA reference sequences by executing the command.
After indexing, align the reads to rRNA reference to rule out the reads originating from rRNA by executing the command. Start by creating a genome index by executing the command. Then align the clean reads with no rRNA contamination to the created reference by executing the command, and then sort and index alignment files by executing the command.
Prepare the transcript annotations by executing the command. Select ribosome protected fragments of specific lengths, and identify their P-site positions by executing the command. Edit the configuration files for each sample and merge them.
Then run RiboCode by executing the command. The frequency distribution of the lengths of the reads showed that most ribosome protected fragments correspond to 25 to 35 nucleotides. The P-site locations for different lengths of ribosome protected fragments were determined by examining the distances from their five prime ends to the annotated start and stop codons.
The mapping results show that 10, 394 genes encode for annotated open reading frames. Further, 509 and 168 genes encode for upstream and downstream open reading frames, while 939 genes encode for either upstream or downstream open reading frames, overlapped with known annotated open reading frames. Further, 68 protein coding genes and 2, 601 non-coding genes encode for novel open reading frames.
Length distribution showed that upstream, downstream, novel, and overlapped open reading frames were shorter than the annotated open reading frames. Relative ribosome protected fragment counts were calculated for each open reading frame, revealing that the ribosome densities of upstream open reading frames were significantly higher in eIF3e deficient cells than in control cells. The metagene analysis revealed that a mass of ribosomes stalled between codons 25 and 75 downstream of the start codon, suggesting that the translation elongation might be blocked early in eIF3e deficient cells.
The P-sites density profiles for upstream open reading frames of PSMA6 and downstream open reading frames of gene SENP3-EIF4A1 were examined, demonstrating the periodicity patterns and densities of ribosome protected fragments. Checking the locations of reads around the start and stop codons of known protein coding regions is necessary for evaluating the periodic properties of reads for each length. RiboCode, together with another command line tool, RiboMiner can also perform quality control and multiple analyses such as quantifying and visualizing the ribosomes'occupancies on the predicted open reading frames.
This computational tool provides a high throughput way to identify uncanonical translation events with ribosome profiling data in specific physiological contexts, and how the translation modulates in response to the stimulus.