Circular RNAs play important regulatory roles in different biological processes. This protocol is suitable for beginners to carry out circular RNA analysis in the area of host and pathogen interactions. Here we put together a few of the tools create a streamlined protocol required for secure RNA prediction and quantification, secure RNA functional enrichment, secure RNA, micro-RNA interaction prediction and CCE RNA network constructions.
This streamline protocol can be applied to clinical samples to identify certain candidates, diagnostic and prognostic values in a host and pathogen interaction setting. I expect those who have no prior programming knowledge to struggle doing the initial phase of this technique. Therefore, I would advise learning the basics of the programming languages used in this technique.
I believe usually looking at how the programming language is applied to be more informative and easier to understand compared to reading alone. To begin open a Linux terminal and in the directory of the host's reference genome execute the commands bwa index and hisat2-build to index the genome. Prepare a yml configuration file containing the name of the file, the path of tools, the path to the downloaded reference files and the path to the index files.
Specify the library type of the RNA sequence data and execute the Ciriquant tool using either the default or manual parameters. Prepare a text file with a list of data containing the IDs of the RNA sequence data, the path to the GTF files outputted by Ciriquant and the grouping of the RNA sequence data whether it is a control or a treated group. On the Linux terminal, run prep_Ciriquant with a prepared text file as an input.
This run will generate a list of files. Prepare a second text file with a list of data containing the RNA sequence IDs and the path to their respective string tie output. The file layout must be similar to the previously prepared text file without the grouping column run.
Run prepde. py with this text file as an input to generate the gene count matrix files. Execute Ciri_DE_Replicate with the library_info.
csv, circRNA_BSJ. csv and gene_count_matrix. csv files as input to output the final circRNA_DE.
tsv file. To filter and determine the number of differentially expressed, or DE, circRNAs, open the circRNA_DE. tsv file with R or any other spreadsheet software.
Unzip and extract the contents of the CircR file after downloading it from the CircR GitHub page using the relevant software, such as WinRar or 7-Zip. Into a new directory where the analysis will be conducted. Then install the prerequisite software applications such as SAMTools, miRanda, RNAhybrid and Pybedtools before conducting the circRNA miRNA analysis.
Index the reference genome file of the organism of interest using the SAMtools FAIDX command and prepare an input file consisting of the coordinates of the DE circRNAs of interest in a tab delimited bed file. Next, execute Circr. py using Python3.
And as arguments specify the circRNA input file, the faster genome of the organism of interest, the genome version of the selected organism, the number of threads, and the name of the output file in the command line. Once the Circr analysis is complete, the program outputs a circRNA-miRNA interaction file in the CSV format. Prepare a tab delimited file containing the circRNAs of interest and their target miRNA.
The first column consists of the circRNA name. The second column specifies the type of RNA from the first column. The third column is the target miRNA.
And the fourth column specifies the type of RNA from the third column. To construct the ceRNA network map, open the Cytoscape software, navigate to file, import, network from file, select the prepared file and upload it. Press the style button to change the visual style of the network.
Then press the arrow on the right side of fill color, choose type for the column, discrete mapping for the mapping type and select the color desired for each RNA type. After that navigate to shape to change the shape of the nodes and follow the steps shown earlier. For gene ontology and KEGG analysis of the parental gene of the circRNAs, ensure the cluster profiler and org.Hs.eg.
db packages have been installed in our studio. Import the DE circRNA information into the R studio workspace. If the user wishes to convert the parental gene names to other formats such as the entrezid use a function such as bidder.
Use the gene ID as an input and run the gene ontology and enrichment analysis using the enrichGO function within the cluster profile or package using default parameters. Finally, run the KEGG enrichment analysis using the gene ID as the input and the enrichKEGG function within the cluster profiler package. The bubble plot of gene ontology enrichment analysis of DE circRNA parental genes is shown in this figure.
The gene ratio on the x axis is the number of genes in that input list associated with a given gene ontology term divided by the total number of genes in that term. The dot size in the plot is represented by the count value which is the number of genes in the input list associated with a given gene ontology term. The bigger the size of the dots, the larger the number of input genes associated with the term.
The dots in the plot are color-coded based on the pvalue which is calculated by comparing the observed frequency of an annotation term with the frequency expected by chance. The enrichment is statistically significant and plotted on the bubble plot only if the pvalue is less than 0.01. Here, the top three enrichments for the biological processes include the ribonucleoprotein complex biogenesis, the response to the virus and the regulation of the response to a biotic stimulus.
While for the molecular functions, only the catalytic activity acting on RNA and single-stranded RNA binding are statistically enriched. For the cellular components, only the retromer complex is statistically enriched. This representative image shows the KEGG enrichment analysis of the DE circRNA parental genes in a bubble plot.
Only two KEGG terms were enriched in this case, the influenza A and the virallife cycle pathways. One of the most important things when attempting this procedure is to ensure the correct trait type of the RNA circ dataset you are using when running injury one. The bio-formatic pipeline provided here helps in predicting the potential secular RNAs and the functional annotations.
However, well-led verification will still be needed to provide solid evidence. This protocol will allow researchers to discover secure RNA and their potential functional roles in the different codes and pathogen interactions, which they can further study.