miRDeep2 can be used to accurately identify plant microRNAs that are important transcriptional regulators in plant development and are crucial for responding to environmental challenges. miRDeep2 requires a markedly short running time and exhibits an excellent performance in both sensitivity and accuracy especially for predicting microRNA in plants with a large genome. False positives and long processing time are challenges in plant microRNA annotation.
By adding a new fielding strategy, overhauling the scoring algorithm and integrating stringent criteria, miRDeep2 can overcome these issues. High confidence microRNA annotation is fundamental for discovering the role of microRNA in regulating diverse function of the genome. Step-by-step guidance can be helpful for first time microRNA annotators.
For experienced researchers, the method is useful for understanding the advantages and the benefits of using miRDeep2 over other tools. To install the miRDeep2 package, navigate to the miRDeep2 webpage and fetch the tarball files. Then extract all contents of the downloaded file into one folder and set the folder path to path.
To test the miRDeep2 pipeline, download the test data and the expected output containing one formatted GSM sequencing file and one Arabidopsis thaliana genome file and move all of the downloaded files to the current working directory. After extracting the compressed tarball files, build the Arabidopsis genome reference index and the noncoding RNA reference index. A folder will be automatically generated in the user_selected_folder containing all of the intermediate files and results.
The miRDeep2 pipeline can then be run using the test data. To check the testing outputs, view the tab delimited output file. The final output of the predicted microRNAs will contain columns that indicate the chromosome ID, strand direction, representative reads ID, precursor ID, mature miRNA location, precursor location, mature sequence and precursor sequence.
Then check the progress_log file which provides information about the finished steps and the script_log and script_err files which contain program outputs and warnings. Before running the pipeline, to ensure that the input reads are preprocessed into the proper format, remove adapters from the five and three prime ends of the deep sequencing reads and ensure that all of the FAST A identifiers are unique. Each sequence identifier must end with an underscore x and an integer indicating the copy number of the exact sequence that was retrieved in the deep sequencing datasets.
To ensure unique FAST A identifier, include a running number in the ID.To build a reference index, if the genome sequences of the species of interest have been indexed, download Bowtie 2 index files from the iGenomes website. Next, build a non-microRNA noncoding RNA index containing the main noncoding sequences from RNA fam including ribosomal RNA, transfer RNA, small nuclear RNA and small nucleolar RNA to filter out noisy sequences from other noncoding RNA fragments. To use miRDeep2 to detect new microRNAs from deep sequencing data, run the bash script in the package to start the analysis pipeline.
The number of different locations a read could be mapped to, the mismatch number for running Bowtie 2 and the threshold of the reads per million can be modified as necessary. To check the miRDeep2 outputs, view the data in the automatically generated output_folder. In this representative analysis, the miRDeep2 microRNA annotation pipeline was applied to 10 public sRNA sequence libraries from five plant species with a gradually increased genome size as indicated.
For each species, two representative small RNA libraries from different tissues and their index genome sequences are processed as two inputs. Using previous methods, the genome processing could take over 100 hours or would sometimes halt in the middle of the analysis due to the length of the genome. miRDeep2, however, finish these prediction processes in a markedly shorter time period from minutes to hours.
For the two sequenced Arabidopsis small RNAs used in this test, miRDeep2 performed better in both sensitivity and accuracy compared to other tools. Please make sure that the input index for the program is correct. For example, use Bowtie only with a Bowtie index and use a large index option for large genomes.
The targets of the resulting microRNA can be predicted using sequencing data which can provide insights into microRNA function. As miRDeep2 can be used to accurately and sensitively identify most microRNA in a specific plant species, the role of microRNA function as a whole can be studied.