A subscription to JoVE is required to view this content. Sign in or start your free trial.

In This Article

  • Summary
  • Abstract
  • Introduction
  • Protocol
  • Results
  • Discussion
  • Disclosures
  • Acknowledgements
  • Materials
  • References
  • Reprints and Permissions

Summary

Here, we present a protocol demonstrating the installation and use of a bioinformatics pipeline to analyze chimeric RNA sequencing data used in the study of in vivo RNA:RNA interactions.

Abstract

An understanding of the in vivo gene regulatory interactions of small noncoding RNAs (sncRNAs), such as microRNAs (miRNAs), with their target RNAs has been advanced in recent years by biochemical approaches which use cross-linking followed by ligation to capture sncRNA:target RNA interactions through the formation of chimeric RNAs and subsequent sequencing libraries. While datasets from chimeric RNA sequencing provide genome-wide and substantially less ambiguous input than miRNA prediction software, distilling this data into meaningful and actionable information requires additional analyses and may dissuade investigators lacking a computational background. This report provides a tutorial to support entry-level computational biologists in installing and applying a recent open-source software tool: Small Chimeric RNA Analysis Pipeline (SCRAP). Platform requirements, updates, and an explanation of pipeline steps and manipulation of key user-input variables is provided. Reducing a barrier for biologists to gain insights from chimeric RNA sequencing approaches has the potential to springboard discovery-based investigations of regulatory sncRNA:target RNA interactions in multiple biological contexts.

Introduction

Small noncoding RNAs are highly studied for their post-transcriptional roles in coordinating expression from suites of genes in diverse processes such as differentiation and development, signal processing, and disease1,2,3. The ability to accurately determine the target transcripts of gene-regulatory small noncoding RNAs (sncRNAs), including microRNAs (miRNAs), is of importance to studies of RNA biology at both basic and translational levels. Bioinformatic algorithms that exploit anticipated complementarity between the miRNA seed sequence and its potential targets have been frequently used for the prediction of miRNA:target RNA interactions. While these bioinformatic algorithms have been successful, they also can harbor both false positive and false negative results, as has been reviewed elsewhere4,5,6. Recently, several biochemical approaches have been designed and implemented that allow unambiguous and semiquantitative determination of in vivo sncRNA:target RNA interactions by in vivo crosslinking and ensuing incorporation of a ligation step to physically attach the sncRNA to its target to form a single chimeric RNA4,5,7,8,9,10. Subsequent preparation of sequencing libraries from the chimeric RNAs allows assessment of the sncRNA:target RNA interactions by computational processing of the sequencing data. This video provides a tutorial for installing and using a computational pipeline termed small chimeric RNA analysis pipeline (SCRAP), which is designed to allow robust and reproducible analysis of sncRNA:target RNA interactions from chimeric RNA sequencing libraries6.

A goal of this tutorial is to assist investigators in avoiding excessive reliance on purely predictive bioinformatic algorithms by lowering barriers to the analysis of data generated through biochemical approaches providing chimeric molecular readouts of sncRNA:target RNA interactions. This tutorial provides practical steps and tips to guide entry-level computational scientists through the use of a pipeline, SCRAP, developed for analyzing chimeric RNA sequencing data, which can be generated by several existing biochemical protocols, including crosslinking, ligation, and sequencing of hybrids (CLASH) and covalent ligation of endogenous Argonaute-bound RNAs- crosslinking and immunoprecipitation (CLEAR-CLIP)7,9.

The use of SCRAP offers several advantages for the analysis of chimeric RNA sequencing data, compared to other computational pipelines6. One salient advantage is its extensive annotation and the incorporation of call-outs to well-supported and routinely updated bioinformatic scripts within the pipeline, in comparison to alternative pipelines that often rely on custom and/or unsupported scripts for steps in the pipeline. This feature lends stability to SCRAP, making it more worthwhile for researchers to familiarize themselves with the pipeline and to incorporate its use into their workflow. SCRAP has also been demonstrated to outperform alternative pipelines in calling peaks of sncRNA:target RNA interactions and to have cross-platform functionality, as detailed in a prior publication6.

By the end of this tutorial, users will be able to (i) know platform requirements for SCRAP and install SCRAP pipelines, (ii) install reference genomes and set up command line parameters for SCRAP, and (iii) understand peak calling criteria and perform peak calling and peak annotation.

This video will describe in practical detail how researchers studying RNA biology may install and optimally use the computational pipeline, SCRAP, to analyze sncRNA interactions with target RNAs, such as messenger RNAs, in chimeric RNA-sequencing data obtained through one of the discussed biochemical approaches to sequencing library preparation.

SCRAP is a command line utility. Generally, following the guide below, the user will need to (i) download and install SCRAP (https://github.com/Meffert-Lab/SCRAP), (ii) Install reference genomes and run SCRAP, and (iii) perform peak calling and annotation.

Further details of the computational steps in this procedure can be found at https://github.com/Meffert-Lab/SCRAP. This article will provide the setup and background information to allow investigators with entry-level computational skills to install, optimize, and use SCRAP on chimeric RNA sequencing library datasets.

Protocol

NOTE: The protocol will begin with downloading and installing software required to analyze chimeric RNA sequencing libraries using SCRAP.

1. Installation

  1. Before installing SCRAP, install the dependencies Git and Miniconda on the machine to be used for the analyses. Git is likely already installed. On the Mac OSX platform, for example, verify this using which git to see that the "git" utility is present and installed in this directory. Check if Miniconda is installed using which conda. If nothing is returned, install Miniconda. Miniconda requires 400 MB of disk space to install.
    1. There are a few methods to install Miniconda, and they differ by platform. Refer to the PLATFORM-SETUP markdown file on the Meffert Lab GitHub repository [https://github.com/Meffert-Lab/SCRAP/blob/main/PLATFORM-SETUP.md] where there are further instructions for installing on Windows, MacOS, and Ubuntu. For Linux users, Linux has its own default package manager (apt). In the case specific to this study, use the command brew install Miniconda to install Miniconda using an existing package manager, brew.
      NOTE: 'Homebrew', termed 'brew' is an open-source software package management system that simplifies the installation of software on Apple's operating system, macOS.
    2. If conda is being installed for the first time, run conda init for the particular shell that is in use. In the example here, that shell in use is zsh. Then, close and re-open the shell. If conda was successfully installed, the base environment activated within the terminal session will be seen.
  2. Download the SCRAP source and install its dependencies.
    1. The preferred method for obtaining SCRAP source is using Git. Access this by running git clone https://github.com/Meffert-Lab/SCRAP to obtain the latest copy of the source code.
    2. Install mamba, an improved package solver for conda, and install all the dependencies for SCRAP from SCRAP_environment.yml to its own conda environment using the following commands:
      conda install -n base conda-forge::mamba
      mamba env create -f SCRAP/SCRAP_environment.yml -n SCRAP
  3. Next, run the reference installation for SCRAP. The arguments used in the reference installation will be specific to the organism whose sncRNA-mRNA interactions are being analyzed.
    bash SCRAP/bin/Reference_Installation.sh -r full/path/to/SCRAP/ -m hsa -g hg38 -s human
    1. Provide the directory of the SCRAP source folder for reference installation. Installation steps will then be performed using the files within the fasta and annotation folders. List the full path without any shorthand. End with a slash.
    2. Refer to the tables in README.md for the correct miRbase species abbreviations. The up-to-date reference genomes can be found at https://genome.ucsc.edu/ or https://www.ncbi.nlm.nih.gov/data-hub/genome/. In this example, hg38 will be used for the mouse GRCm38 genome.
    3. The currently included species for annotation are human, mouse, and worm. View the corresponding species.annotation.bed files in the annotation directory in the SCRAP source folder. If the use of a different species for analysis is desired, provide an annotation.bed file that follows the same naming scheme species.annotation.bed.

2. Running SCRAP

  1. Now that the dependencies and SCRAP are installed, - run the script SCRAP.sh
    bash SCRAP/bin/SCRAP.sh -d full/path/to/CLASH_Human/ -a full/path/to/CLASH_Human/CLASH_Human_Adapters.txt -p no -f yes -r full/path/to/SCRAP/ -m hsa -g hg38
    1. List the entire path to the sample directories without any shorthand. Format the sample directories with the folder name matching the sample name exactly, as shown in Figure 1.
    2. Note that the path listed is the path to the directory that contains all the sample folders, not the path to any individual sample folder or a sample file (refer to the command line in step 2.1).
    3. Next, list the entire path to the adapter file. Ensure that the sample names in the adapter file match the previously mentioned folder names and file names (refer to the command line in step 2.1).
    4. Indicate whether the samples are paired-end and whether or not filtering for pre-miRNAs and/or tRNAs will be performed. Add a filter for rRNA cleaning if desired (refer to the command line in step 2.1).
      NOTE: The users may or may not decide to use these filters depending on the sample types and experimental goals. Depending upon the experimental design, pre-miRNAs, tRNAs, and rRNAs can consume available sequencing depth for real sncRNA:target RNA chimeras and users can employ filters to exclude them. However, users may want to avoid such filtering in certain circumstances (e.g., mapping sncRNA targets to the mitochondrial genome, which contains mitochondrial rRNAs).
    5. Next, list the entire path to the reference directory, the miRbase abbreviation, and the reference genome abbreviation (refer to the command line in step 2.1).
      ​NOTE: The script may take a few hours to complete, depending on the dataset size and the CPU of the computer being used.

3. Peak calling and annotation

  1. Once SCRAP is finished running, check that the output includes, among other files, a SAMPLE.aligned.unique.bam file. This is a binary file containing alignments of target RNAs onto the user-provided reference genome.
  2. Now perform peak calling by running Peak_Calling.sh.
    bash SCRAP/bin/Peak_Calling.sh -d CLASH_Human/ -a CLASH_Human/CLASH_Human_Adapters.txt -c 3 -l 2 -f no -r SCRAP/ -m hsa -g hg38
    NOTE: Peak calling is a feature of SCRAP, which is designed to allow researchers to readily evaluate the most robust and reproducible small noncoding RNA:target RNA interactions within their chimeric RNA libraries. This feature, for example, can aid researchers in identifying interactions that they may want to select for further investigation. Step 3.2.2 below describes how the user sets the criteria which they want to be used to define the stringency with which a peak is called - this includes the number of unique interactions, or sequencing reads, which must have occurred for the peak to be called, as well as the number of libraries in which this particular interaction must have occurred.
    1. Again, list the full paths to the directory containing the sample folders, and the adapter file (refer to the command line in step 3.2).
    2. Next, set the minimum number of sequencing reads required for a peak to be called (refer to the command line in step 3.2).
    3. Set the minimum number of distinct sequencing libraries that must contain a peak for it to be called (refer to the command line in step 3.2).
      NOTE: The choice of values for both 3.2.2 and 3.2.3 will depend upon the nature of the samples sequenced and the number of samples or sample types. Here, at least 3 chimeric sequencing reads in a sample are required to call a peak, and the peak must be supported by at least 2 samples. An investigator evaluating a dataset in which there are many sequencing library replicates for a given condition, for example, might decide to require the presence of the reads in a greater number of sample sequencing libraries.
    4. Indicate whether sncRNAs of the same family must contribute to the same peak. For example, since miRNAs of the same family share seed sequences, these miRNAs can bind shared and overlapping sets of gene targets; a user might want to identify the full impact of a family on these targets by assessing their collective peaks(refer to the command line in step 3.2).
    5. Next, indicate the full path to the reference directory, the miRBase abbreviation, and the reference genome abbreviation (refer to the command line in step 3.2).
  3. Once peak calling is complete, run peak annotation.
    ​bash SCRAP/bin/Peak_Annotation.sh -p CLASH_Human/peaks.bed -r SCRAP/ -s human
    1. List the full path to the resulting peaks.bed (or peaks.family.bed) file from peak calling, the full path to the reference directory, and the desired species for annotation.

4. Visualizing the data

NOTE: All steps for analysis using SCRAP are now completed. For visualizing the data, several approaches are recommended:

  1. Merge all the .bam (binary SAM file) files that will be desired to visualize together (samtools merge).
  2. Sort the resulting merged .bam file (samtools sort). File contents are sorted line by line so that samtools may index.
  3. Index the sorted .bam file (samtools index). A BAI (binary samtools format index) file is generated to permit visualization in the integrative genomics viewer (IGV).
  4. Finally, open the resulting sorted .bam and indexed .bai file in IGV.
    NOTE: SncRNA:Target RNA interactions of interest may be prioritized for follow-up in a number of investigation-specific ways. One generic initial approach is to assess the interactions for which peaks are supported by the most chimeric sequencing reads. Interactions of interest may also be visualized using the DuplexFold Web Server from the RNAstructure package by inputting the sequence for both the sncRNA and the target RNA from the detected interaction11. For each peak, the chromosome (first column) and genomic coordinates (start: 1st column end: 2nd column) can be found within the peaks.bed.species.annotation.txt file generated in peak annotation. For miRNAs in particular, while reproducible and functional interactions can lack extensive seed-matched binding (e.g., interactions may use 3' compensatory binding), the presence of seed-matched sites in a cognate binding motif of the target RNA can nonetheless be assessed as a validating feature of functionally important detected interactions4,12. Ancillary data processing could include comparisons of differential read coverage between peaks in distinct biological conditions and potentially assessment of clustering of regulated genes into pathways using a pathway analysis tool.

Results

Results for sncRNA:target RNA detected by a modified version of SCRAP (SCRAP release 2.0, which implements modifications for rRNA filtering) on previously published sequencing datasets prepared using CLEAR-CLIP9 is shown in Figure 2 and Table 1. Users can appreciate the decrease in the relative fraction miRNA interactions with intron regions which occurs following the isolation of high-confidence interactions by peak calling in SCRAP. Additional data ...

Discussion

This protocol on the use of SCRAP pipeline for analysis of sncRNA:target RNA interactions is designed to assist investigators who are entering into computational analysis. Completion of the tutorial is expected to guide investigators with entry-level or greater computational experience through the steps required for installation and use of this pipeline and its application to analyze data gained from chimeric RNA sequencing libraries. Steps critical to the completion of this protocol include correct reference installatio...

Disclosures

The authors have nothing to disclose.

Acknowledgements

We thank members of the Meffert laboratory for helpful discussions, including BH Powell and WT Mills IV, for critical feedback on describing the installation and implementation of the pipeline. This work was supported by a Braude Foundation award, the Maryland Stem Cell Research Fund Launch Program, the Blaustein Endowment for Pain Research and Education award, and NINDS RO1NS103974 and NIMH RO1MH129292 to M.K.M.

Materials

NameCompanyCatalog NumberComments
GenomesUCSC Genome browserN/Ahttps://genome.ucsc.edu/ or https://www.ncbi.nlm.nih.gov/data-hub/genome/
LinuxLinuxUbuntu 20.04 or 22.04 LTS recommended
MacAppleMac OSX (>11)
Platform setupGitHubN/Ahttps://github.com/Meffert-Lab/SCRAP/blob/main/PLATFORM-SETUP.md]
SCRAP pipelineGitHubN/Ahttps://github.com/Meffert-Lab/SCRAP
Unix shellUnix operating systembash >=5.0
Unix shellUnix operating systemzsh (5.9 recommended)
WindowsWindowsWSL Ubuntu 20.04 or 22.04 LTS

References

  1. Morris, K. V., Mattick, J. S. The rise of regulatory RNA. Nature Reviews Genetics. 15 (6), 423-437 (2014).
  2. Li, X., Jin, D. S., Eadara, S., Caterina, M. J., Meffert, M. K. Regulation by noncoding RNAs of local translation, injury responses, and pain in the peripheral nervous system. Neurobiology of Pain (Cambridge, Mass.). 13, 100119 (2023).
  3. Shi, J., Zhou, T., Chen, Q. Exploring the expanding universe of small RNAs. Nature Cell Biology. 24 (4), 415-423 (2022).
  4. Broughton, J. P., Lovci, M. T., Huang, J. L., Yeo, G. W., Pasquinelli, A. E. Pairing beyond the seed supports microRNA targeting specificity. Molecular Cell. 64 (2), 320-333 (2016).
  5. Grosswendt, S., et al. Unambiguous identification of miRNA:target site interactions by different types of ligation reactions. Molecular Cell. 54 (6), 1042-1054 (2014).
  6. Mills, W. T., Eadara, S., Jaffe, A. E., Meffert, M. K. SCRAP: a bioinformatic pipeline for the analysis of small chimeric RNA-seq data. RNA. 29 (1), 1-17 (2023).
  7. Helwak, A., Kudla, G., Dudnakova, T., Tollervey, D. Mapping the human miRNA interactome by CLASH reveals frequent noncanonical binding. Cell. 153 (3), 654-665 (2013).
  8. Hoefert, J. E., Bjerke, G. A., Wang, D., Yi, R. The microRNA-200 family coordinately regulates cell adhesion and proliferation in hair morphogenesis. Journal of Cell Biology. 217 (6), 2185-2204 (2018).
  9. Moore, M. J., Zhang, C., Gantman, E. C., Mele, A., Darnell, J. C., Darnell, R. B. Mapping Argonaute and conventional RNA-binding protein interactions with RNA at single-nucleotide resolution using HITS-CLIP and CIMS analysis. Nature Protocols. 9 (2), 263-293 (2014).
  10. Bjerke, G. A., Yi, R. Integrated analysis of directly captured microRNA targets reveals the impact of microRNAs on mammalian transcriptome. RNA. 26 (3), 306-323 (2020).
  11. Reuter, J. S., Mathews, D. H. RNAstructure: software for RNA secondary structure prediction and analysis. BMC Bioinformatics. 11 (1), 129 (2010).
  12. Moore, M. J., et al. miRNA-target chimeras reveal miRNA 3′-end pairing as a major determinant of Argonaute target specificity. Nature Communications. 6 (1), 8864 (2015).
  13. Travis, A. J., Moody, J., Helwak, A., Tollervey, D., Kudla, G. Hyb: a bioinformatics pipeline for the analysis of CLASH (crosslinking, ligation and sequencing of hybrids) data. Methods (San Diego, Calif.). 65 (3), 263-273 (2014).

Reprints and Permissions

Request permission to reuse the text or figures of this JoVE article

Request Permission

Explore More Articles

Computational AnalysisChimeric Small Noncoding RNARNA Sequencing LibrariesTarget RNA InteractionsBioinformatics PipelineHigh Throughput SequencingGene RegulationSmall Noncoding RNAsMicroRNAsSCRAP SoftwareRNA Analysis TutorialChimeric RNA SequencingOpen source ToolsGenomic InteractionsBiochemical Approaches

This article has been published

Video Coming Soon

JoVE Logo

Privacy

Terms of Use

Policies

Research

Education

ABOUT JoVE

Copyright © 2025 MyJoVE Corporation. All rights reserved