The author would like to thank Tan Ke En and Dr. Cameron Bracken for their critical review of this manuscript. This work was supported by grants from Fundamental Research Grant Scheme (FRGS/1/2020/SKK0/UM/02/15) and University of Malaya High Impact Research Grant (UM.C/625/1/HIR/MOE/CHAN/02/07).

In this protocol, de-identified ribosomal RNA (rRNA)-depleted RNA-seq library datasets prepared from the influenza A virus-infected human macrophage cells were downloaded and used from the Gene Expression Omnibus (GEO) database. The entire bioinformatics pipeline from prediction to functional characterization of circRNAs is summarized in Figure 1. Each part of the pipeline is further explained in the sections below.
1. Preparation, download, and setup before data analysis
NOTE: All software packages used in this study are free and open-source.
<ol>
	<li>Downloading the required tools on the Linux platform
	<ol>
		<li>Download and install the required software and tools listed in the Table of Materials on a Linux high-performance computer using the instructions provided by the developer. 
		NOTE: Most of the tools and software have their own online GitHub pages or documentation containing instructions on installing and using their tools (refer to the Table of Materials).</li>
		<li>Download the desired RNA-seq datasets for circRNA detection and analysis from sequence archive websites (e.g., European Nucleotides Archive and Gene Expression Omnibus).</li>
		<li>Download the reference genome(s) (FASTA format) and annotation files (GTF/GFF3 format), compatible with the host from which the RNA-seq dataset was prepared. Host reference genome(s) and annotation files are usually found on online genome browsers such as the National Center for Biotechnology Information (NCBI), the University of California Santa Cruz (UCSC), and the Ensembl websites.</li>
	</ol>
	</li>
	<li>Quality checking of RNA-seq
	<ol>
		<li>Input the FASTQ files into the FASTQC program to determine the quality of RNA sequences. If the quality of the FASTQ files are low (e.g., &#60;Q20) or there is a presence of adapter sequences, further trimming might be needed using tools such as Trimmomatic29,30.</li>
	</ol>
	</li>
</ol>
2. Prediction and differential expression analysis of circRNAs using CIRIquant
NOTE: A more detailed manual on installing and performing differential expression analysis can be found in the code availability section of the CIRIquant paper31. The supplementary data also includes some of the basic commands used in this protocol.
<ol>
	<li>CircRNA predictions
	<ol>
		<li>Index the host&#39;s reference genome first using BWA and HISAT2 aligners. Then, on a Linux terminal, execute the commands bwa index32 and hisat2-build33 in the directory of the host&#39;s reference genome to index it.</li>
		<li>Next, prepare a YML config file containing the name of the file, the path of tools (BWA, HISAT2, stringtie34, samtools35), the path to the downloaded reference files (host&#39;s reference genome FASTA files, annotation files), and the path to the index files from step 2.1.1.</li>
		<li>Execute the CIRIquant tool from terminal using either the default or manual parameters. The user can specify the library type (either stranded or non-stranded) of the RNA-seq data when executing the CIRIquant tool. 
		NOTE: The library type of the RNA-seq data can be determined by knowing the type of library preparation kit used. If the identity of the library preparation kit is unknown, an RNA-seq control bioinformatic package called RSeQC36 can be used to determine the strandedness of RNA-seq data.</li>
	</ol>
	</li>
	<li>Differential expression analysis 
	NOTE: CIRIquant package includes prep_CIRIquant, prepDE.py, and CIRI_DE_replicate; therefore, no additional downloads are needed for these three tools.
	<ol>
		<li>Prepare a text file (.lst) with a list of data containing the following: 
		1st column: IDs of the RNA-seq data used in step 2.1.3 
		2nd column: path to the GTF files outputted by CIRIquant 
		3rd column: grouping of the RNA-seq data, whether it is a control or treated group.</li>
		<li>For an example, refer to Table 1 below. 
		NOTE: It is not necessary to put in the headers as they are just for reference.</li>
		<li>On the Linux terminal, run prep_CIRIquant with the text file (.lst) prepared in step 2.2.1 as an input. The run will generate a list of files: library_info.csv, circRNA_info.csv, circRNA_bsj.csv, and circRNA_ratio.csv.</li>
		<li>Prepare a second text file with a list of data containing the RNA-seq IDs and the path to their respective StringTie output. The file layout must be similar to the text file in step 2.2.1 without the grouping column.</li>
		<li>Run prepDE.py with the text file prepared in step 2.2.4 as an input to generate the gene count matrix files.</li>
		<li>Execute CIRI_DE_replicate with the library_info.csv and circRNA_bsj.csv files from step 2.2.3 and the gene_count_matrix.csv file from step 2.2.5 as inputs to output the final circRNA_de.tsv file.</li>
	</ol>
	</li>
	<li>Filtering of DE circRNAs
	<ol>
		<li>Use R (in the computer terminal or RStudio) or any spreadsheet software (e.g., Microsoft Excel) to open the circRNA_de.tsv file generated from step 2.2.6 to filter and determine the number of differentially expressed (DE) circRNAs.</li>
		<li>Filter the DE circRNAs according to the criteria LogFC &#62; |2| and FDR &#60; 0.05.</li>
		<li>Create a file named DE_circRNAs.txt to store the information of DE circRNAs.</li>
	</ol>
	</li>
</ol>
3. Characterization and annotation of predicted DE circRNAs
<ol>
	<li>Annotation status of DE circRNAs
	<ol>
		<li>Load the file named DE_circRNAs.txt in RStudio , which consists of the list of DE circRNAs filtered from step 2.3.3. Include other information such as the genomic positions (Chr, Start, End), strand orientations (+ or -), gene name, and circRNA type. Before proceeding, convert the circRNA genomic start coordinates from CIRIquant to 0-based by subtracting 1 base pair. 
		NOTE: The other information stated above can be obtained from the GTF files outputted by CIRIquant (Supplementary File 1).</li>
		<li>Determine the annotation status of the predicted DE circRNAs by downloading a library containing the genomic positions of the circRNA-database (e.g., circBase) deposited circRNAs. 
		NOTE: Assure that the genome version used to predict the circRNAs is identical to the circRNA database library before making the comparison. The circBase data file used here is freely available in the drive folder provided in Github (https://github.com/bicciatolab/Circr)37.</li>
		<li>Once both the files from step 3.1.1 and step 3.1.2 are prepared, run the R script given in Supplementary File 1. Chromosomal locations of DE circRNAs are queried to the library before assigning the status Annotated or Unannotated.</li>
	</ol>
	</li>
	<li>Characterization of DE circRNAs
	<ol>
		<li>Use R and other spreadsheet software to summarize the number of circRNAs according to the circRNA types (i.e., exon, intron, intergenic, and antisense) and the number of genes that the circRNAs span across (1 or &#62;1) (Supplementary File 1).&#8203;NOTE: CIRIquant can only detect four types of circRNAs (exon, intron, intergenic, and antisense). Exon-intron circRNAs, also known as ElciRNAs, cannot be detected by CIRIquant.</li>
	</ol>
	</li>
</ol>
4. Predicting the circRNA-miRNA interaction using Circr
NOTE: A more detailed manual on how to install and use Circr for the circRNA-miRNA interaction analysis can be found at: https://github.com/bicciatolab/Circr37.
<ol>
	<li>Preparation of files
	<ol>
		<li>Unzip and extract the contents of the Circr.zip file after downloading it from the Circr GitHub page using the relevant software such as &#34;WinRar&#34; or &#34;7-zip&#34; into a new directory where the analysis will be conducted.</li>
		<li>Install the prerequisite software applications (miRanda, RNAhybrid, Pybedtools, and samtools) before conducting the circRNA-miRNA analysis.</li>
		<li>Reference genomes and annotation files for several organisms of interest, rRNA coordinates file, validated miRNA interaction file, and circBase circRNA files are provided by the Circr author in the Github page (https://github.com/bicciatolab/Circr)37. Upon clicking on the support files in drive folder, select the folder for the organism of interest, miRNA folder, and the circBase text file and download it.</li>
		<li>After downloading the necessary files in step 4.1.3, create a new directory named support_files in the directory mentioned in step 4.1.1. Then, unzip and extract the content into the support_files directory.</li>
		<li>Index the reference genome file of the organism of interest using the samtools faidx command (Supplementary File 1).</li>
		<li>Prepare an input file consisting of the coordinates of DE circRNAs of interest in a tab-delimited BED file, as shown in Table 2. 
		NOTE: Because circRNAs predicted by CIRIquant are not 0-based, it is necessary to minus 1 bp at the starting coordinate (as mentioned in step 3.1.1) before converting them to the BED format. The headers shown in Table 2 are just for reference and are not needed in the BED files.</li>
		<li>At this point, ensure that the expected folder tree structure for Circr analysis is as in Figure 2.</li>
	</ol>
	</li>
	<li>Running Circr.py
	<ol>
		<li>Execute Circr.py using Python 3, and as arguments, specify the circRNA input file, the FASTA genome of the organism of interest, the genome version of the selected organism, the number of threads, and the name of the output file in the command line.</li>
		<li>If the organism of interest is not provided in the drive folder listed in step 4.1.3 or if the user prefers to have a custom set of files to run the analysis, additional commands specifying the location of these files need to be included when executing Circr.py.</li>
		<li>After the Circr analysis is complete, the program outputs a circRNA-miRNA interaction file in the csv format.</li>
		<li>Filter the circRNA-miRNA interaction results according to the user-specific preference. For this study, the predictions are filtered using Rstudio according to the criteria below: 
		-Detected by all three software tools 
		-Two or more binding sites reported by both Targetscan and miRanda 
		-Identified in either the &#34;AGO&#34; or &#34;validated&#34; columns 
		-&#8203;Filter out no seed region interactions</li>
		<li>Write the circRNAs that pass the filtered conditions from step 4.2.3 into a new text file named circRNA_miRNA.txt. Such filtering can increase the confidence of the predicted interactions.</li>
	</ol>
	</li>
</ol>
5. Construction of the ceRNA network
NOTE: A detailed manual on how to use Cytoscape can be found at: http://manual.cytoscape.org/en/stable/ and https://github.com/cytoscape/cytoscape-tutorials/wiki#introduction&#160;
<ol>
	<li>Download&#160;and preparation
	<ol>
		<li>Download the latest version of Cytoscape38 from:&#160;https://cytoscape.org/download.html.</li>
		<li>Execute the installer wizard downloaded in step 5.1.1 and select the file location for the Cytoscape software.</li>
		<li>Prepare a tab-delimited file containing the circRNAs of interest and their target miRNA. The first column consists of the circRNA name; the second column specifies the type of RNA from the first column; the third column is the target miRNA; and the fourth column specifies the type of RNA from the third column. An example of the file is shown in Table 3.</li>
	</ol>
	</li>
	<li>Constructing the ceRNA network map
	<ol>
		<li>Open the Cytoscape software installed in step 5.1.2.</li>
		<li>In Cytoscape, navigate to File &#62; Import &#62; Network from File. Select the file that has been prepared in step 5.1.3.</li>
		<li>In the new tab, select the first and second column as &#34;Source Node&#34; and &#34;Source Node Attribute&#34; while select the third and fourth column as &#34;Target Node&#34; and &#34;Target Node Attribute&#34; respectively. Click OK&#160;and the network will show up on the upper right side of Cytoscape.</li>
		<li>To change the visual style of the network, press the Style button on the left side of Cytoscape.</li>
		<li>Press the arrow on the right side of Fill Color. Choose Type for the column and Discrete Mapping for the mapping type. Then, select the color desired for each of the RNA types.</li>
		<li>After changing the color, change the shape of the nodes by navigating to Shape and following step 5.2.5.</li>
	</ol>
	</li>
</ol>
6. Functional enrichment analysis
<ol>
	<li>Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis for the parental gene of the circRNAs
	<ol>
		<li>Ensure the clusterProfiler39,40 and org.Hs.eg.db41 packages have been installed in Rstudio. The org.Hs.eg.db41 package is a genome-wide annotation package only for humans. If the organism of interest is another species, refer to:&#160;https://bioconductor.org/packages/release/BiocViews.html#OrgDb</li>
		<li>Import the DE_circRNA information from step 2.3.1 into the Rstudio workspace.</li>
		<li>Use the parental gene of the circRNAs provided in this file for enrichment analysis in the upcoming steps. However, if the user wishes to convert the gene symbol to other formats, such as the Entrez ID, use a function such as &#34;bitr&#34;.</li>
		<li>By using the gene ID as an input, run the GO enrichment analysis using the enrichGO function within the clusterProfiler39,40 package using default parameters.</li>
		<li>By using the gene ID as an input, run the KEGG enrichment analysis using the enrichKEGG function within the clusterProfiler39,40 package using the default parameters.</li>
	</ol>
	</li>
</ol>

The authors have nothing to disclose.

To illustrate the utility of this protocol, RNA-seq from influenza A virus-infected human macrophage cells was used as an example. CircRNAs functioning as potential miRNA sponges in host-pathogen interactions and their GO and KEGG functional enrichment within a host were investigated. Although there are a variety of circRNA tools available online, each of them is a standalone package that does not interact with one another. Here, we put together few of the tools that are required for circRNA prediction and quantification, circRNA functional enrichment, circRNA-miRNA interaction prediction, and ceRNA network construction. This streamlined protocol is time-saving and can be applied to clinical samples to detect circRNA candidates with diagnostic and prognostic values.
Essentially, we employed CIRIquant31, a circRNA quantification tool pre-packaged with CIRI2, which can detect and carry out DE analysis of circRNAs. DE circRNAs are filtered based on a cut-off value of LogFC &#62; |2| and FDR &#60; 0.05, which helps to eliminate potential false positives in downstream analyses. Characterization of DE circRNAs in terms of annotation status, circRNA types, and the number of genes spanned aid in categorizing and further filtering circRNA candidates. Subsequently, Circr37, a circRNA-miRNA prediction tool, is used to predict potential miRNA sponging candidates. After predicting potential miRNAs as targets of circRNAs, a ceRNA network is drawn. Finally, based on the parental genes of circRNAs, the R clusterProfiler package39 is used for functional annotation via the GO and KEGG pathway enrichment analysis. Results from GO and KEGG may help to unravel the biological mechanisms influenced by circRNAs.
To date, several different circRNA prediction tools have been developed, including CIRI243, CIRCexplorer244, find_circ45, MapSplice46, and UROBORUS47. In a study conducted by Hansen et al., CIRI2 is reported to have a high overall performance. It is among the few circRNA detection tools that can function well in terms of de novo prediction and the reduction of false positive identification48. CIRIquant which utilizes CIRI2 for circRNA detection and quantification was therefore used in this study. CIRIquant was used to count the back splice junction (BSJ) reads, and the count data were normalized to the reads mapped to cognate linear RNAs transcribed from the same gene loci. This allows the quantification of circRNAs in a sample. To determine the differential expression of circRNAs across experimental conditions, CIRIquant implemented a generalized linear model in edgeR49 for DE analysis, and the exact rate-ratio test was used as a statistical test to determine the significance of the difference in the circRNA junction ratio. Although other circRNA quantification tools such as CIRCexplorer3-CLEAR50 can be used to quantify the expression level of circRNAs, this tool only allows circRNA quantification in a sample as it counts the BSJ reads in a sample and normalizes the count data against the cognate linear RNA counts from the same sample. CIRCexplorer3-CLEAR cannot compare circRNA expressions across experimental conditions. Furthermore, no statistical analysis tool is implemented in CIRCexplorer3-CLEAR to support the quantified expression level. Although the default circRNA prediction tool implemented within CIRIquant is CIRI2, the prediction results from other tools such as find_circ and CIRCexplorer2 can also be utilized for the quantification and DE analysis31. In this protocol, only one circRNA prediction tool (CIRI2) was used for prediction, which might still yield false-positive circRNA candidates. To reduce false positives, one can combine other circRNA prediction tools for analysis and select common circRNAs detected among the different circRNA prediction tools48,51. To further improve circRNA detection, it is ideal to use RNA sequencing datasets that are both rRNA-depleted and subjected&#160;to RNase R pre-treatment.
Depending on the objective of the study, de novo and annotated DE circRNAs can be identified separately based on the circBase database52. However, circRNAs spanning more than one gene often require manual examination on UCSC or any other genome browser to determine the authenticity of circRNAs and eliminate false positives. Nonetheless, circRNAs that span more than one gene, such as circRNAs derived from fusion genes, have also been reported recently53,54.
Circr works by combining three different miRNA-mRNA predicting algorithms, namely, TargetScan55, miRanda56, and RNAhybrid57 to predict the circRNA-miRNA binding sites. On top of that, the algorithm also incorporates information of AGO peaks and previously validated interactions in the circRNA-miRNA analysis. Here, stringent filtering criteria were applied to allow a more reliable circRNA-miRNA prediction to be obtained, thus, further reducing false positives. However, the stringency of this filtering step could be set higher or lower depending on user preference.
ClusterProfiler is a well-documented R package that can functionally annotate gene sets across diverse organisms. Besides the functions within the R clusterProfiler package mentioned in this protocol (enrichGO and enrichKEGG), which utilize over-representation analysis, there are also other functions such as gseGO and gseKEGG that can be used. If clusterProfiler is not a suitable choice for the workflow, there are also other tools and packages such as the &#34;AllEnricher&#34;58 or the website-based tools such as &#34;Metascape&#34;59 that can functionally annotate a set of genes. Lastly, although the pipeline provided above helps in predicting potential circRNAs and their functional annotations, wet-lab verification will be needed to provide solid evidence.

Host-pathogen interactions represent a complex interplay between the pathogens and host organisms, which triggers the hosts&#39; innate immune responses that eventually result in the removal of invading pathogens1,2. During&#160;pathogenic infections, a multitude of the host immune genes is regulated to inhibit the replication and release of pathogens. For example, common interferon-stimulated genes (ISGs) regulated upon pathogenic infections include ADAR1, IFIT1, IFIT2, IFIT3, ISG20, RIG-I, and OASL3,4. Besides protein-coding genes, studies have also reported that non-coding RNAs such as long non-coding RNAs (lncRNAs), microRNAs (miRNAs), and circular RNAs (circRNAs) also play a role and are regulated concurrently during pathogenic infections5,6,7. In contrast to protein-coding genes that mainly encode proteins as functional molecules, non-coding RNAs (ncRNAs) are known to function as regulators of genes at transcriptional and post-transcriptional levels. However, studies involving the participation of non-coding RNAs, particularly circRNAs, in regulating the hosts&#39; immune genes are not well reported compared to the protein-coding genes.
CircRNAs are widely characterized by their covalently closed continuous loop structure, which is generated through a non-canonical splicing process called back-splicing8. The process of back-splicing, unlike the splicing process of cognate linear RNAs, involves the ligation of the downstream donor site to the upstream acceptor site, forming a circular-shaped structure. Currently, three different back-splicing mechanisms for the biogenesis of circRNAs have been proposed. These are RNA binding protein (RBP) mediated circularization9,10, intron-pairing-driven circularization11, and lariat-driven circularization12,13,14. Given that circRNAs are connected end-to-end in a circular structure, they tend to be naturally resistant to normal exonuclease digestions and, thus, are considered to be more stable than their linear counterparts15. Another common characteristic exhibited by circRNAs includes the cell or tissue type-specific expression in hosts16.
As implied by their unique structure and cell or tissue-specific expression, circRNAs have been discovered to play important biological functions in cells. To date, one of the prominent functions of circRNAs is their role as microRNA (miRNA) sponges17,18. This regulatory role of circRNAs occurs through the complementary binding of circRNA nucleotides with the seed region of miRNAs. Such a circRNA-miRNA interaction inhibits the miRNAs&#39; normal regulatory functions on target mRNAs, thus regulating the expression of genes19,20. Additionally, circRNAs are also known to regulate gene expression by interacting with RNA binding proteins (RBPs) and forming RNA-protein complexes21. Although circRNAs are classified as non-coding RNAs, there is also evidence that circRNAs can act as templates for protein translation22,23,24.
Recently, circRNAs have been demonstrated to play pivotal roles in regulating the host-pathogen interactions, particularly between the hosts and viruses. Generally, host circRNAs are assumed to assist in regulating the host&#39;s immune responses to eliminate the invading pathogens. An example of&#160;circRNA that promotes host immune responses is circRNA_0082633, reported by Guo et al.25. This circRNA enhances type I interferon (IFN) signaling within A549 cells, which helps to suppress influenza virus replication25. Moreover, Qu et al.&#160;also reported a human intronic circRNA, called circRNA AIVR, that promotes immunity by regulating the expression of CREB-binding protein (CREBBP), a signal transducer of IFN-&#946;26,27. However, circRNAs that are known to promote the pathogenesis of disease upon infection also exist. For example, Yu et al. recently reported the role played by a circRNA spliced from the GATA zinc finger domain containing the 2A gene (circGATAD2A) in promoting the H1N1 virus replication through the inhibition of host cell autophagy28.
To effectively study circRNAs, a genome-wide circRNA prediction algorithm is usually implemented, followed by an in silico characterization of the predicted circRNA candidates before any functional studies can be carried out. Such a bioinformatics approach to predict and characterize circRNAs is less costly and more time efficient. It helps to refine the number of candidates to be functionally studied and could potentially lead to novel findings. Here, we provide a detailed bioinformatics-based protocol for the in silico identification, characterization and functional annotation of circRNAs during the host-pathogen interactions. The protocol includes the identification and quantification of circRNAs from RNA-sequencing datasets, annotation via circBase, and the characterization of the circRNA candidates in terms of circRNA types, number of overlapping genes, and predicted circRNA-miRNA interactions. This study also provides the functional annotation of the circRNA parental genes through Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis.

<table><tbody><tr><td>Bedtools</td><td>GitHub</td><td>https://github.com/arq5x/bedtools2/</td><td>Referring to section 4.1.2. Needed for Circr.</td></tr><tr><td>BWA</td><td>Burrows-Wheeler Aligner</td><td>http://bio-bwa.sourceforge.net/</td><td>Referring to section 2.1.1 and 2.1.2. Needed to run CIRIquant, and to index the genome</td></tr><tr><td>Circr</td><td>GitHub</td><td>https://github.com/bicciatolab/Circr</td><td>Referring to section 4. Use to predict the miRNA binding sites</td></tr><tr><td>CIRIquant</td><td>GitHub</td><td>https://github.com/bioinfo-biols/CIRIquant</td><td>Referring to section 2.1.3. To predict circRNAs</td></tr><tr><td>Clusterprofiler</td><td>GitHub</td><td>https://github.com/YuLab-SMU/clusterProfiler</td><td>Referring to section 7. For GO and KEGG functional enrichment</td></tr><tr><td>CPU</td><td>Intel</td><td>&amp;nbsp;Intel(R) Xeon(R) CPU E5-2620 V2 @ 2.10 GHz&amp;nbsp;&amp;nbsp; Cores: 6-core CPU Memory: 65 GB Graphics card: NVIDIA GK107GL (QUADRO K2000)&amp;nbsp;</td><td>Specifications used to run this entire protocol.</td></tr><tr><td>Cytoscape</td><td>Cytoscape</td><td>https://cytoscape.org/download.html</td><td>Referring to section 5.2. Needed to plot ceRNA network</td></tr><tr><td>FastQC</td><td>Babraham Bioinformatics</td><td>https://www.bioinformatics.babraham.ac.uk/projects/fastqc/</td><td>Referring to section 1.2.1. Quality checking on Fastq files</td></tr><tr><td>HISAT2</td><td></td><td>http://daehwankimlab.github.io/hisat2/</td><td>Referring to section 2.1.1 and 2.1.2. Needed to run CIRIquant, and to index the genome</td></tr><tr><td>Linux</td><td>Ubuntu 20.04.5 LTS (Focal Fossa)</td><td>https://releases.ubuntu.com/focal/</td><td>Needed to run the entire protocol. Other Ubuntu versions may still be valid to carry out the protocol.</td></tr><tr><td>miRanda</td><td></td><td>http://www.microrna.org/microrna/getDownloads.do</td><td>Referring to section 4.1.2. Needed for Circr</td></tr><tr><td>Pybedtools</td><td>pybedtools 0.8.2</td><td>https://pypi.org/project/pybedtools/</td><td>Needed for BED file genomic manipulation</td></tr><tr><td>Python</td><td>Python 2.7 and 3.6 or abover</td><td>https://www.python.org/downloads/</td><td>To run necessary library modules</td></tr><tr><td>R</td><td>The Comprehensive R Archive Network</td><td>https://cran.r-project.org/</td><td>To manipulate dataframes</td></tr><tr><td>RNAhybrid</td><td>BiBiServ</td><td>https://bibiserv.cebitec.uni-bielefeld.de/rnahybrid</td><td>Referring to section 4.1.2. Needed for Circr</td></tr><tr><td>RStudio</td><td>RStudio</td><td>https://www.rstudio.com/</td><td>A workspace to run R</td></tr><tr><td>samtools&amp;nbsp;</td><td>SAMtools</td><td>http://www.htslib.org/</td><td>Referring to section 2.1.2. Needed to run CIRIquant</td></tr><tr><td>StringTie</td><td>Johns Hopkins University: Center for Computational Biology</td><td>http://ccb.jhu.edu/software/stringtie/index.shtml</td><td>Referring to section 2.1.2. Needed to run CIRIquant</td></tr><tr><td>TargetScan</td><td>GitHub</td><td>https://github.com/nsoranzo/targetscan</td><td>Referring to section 4.1.2. Needed for Circr</td></tr></tbody></table>

in silico identification and characterization of circrnas during host-pathogen interactions

Circular RNAs (circRNAs) are a class of non-coding RNAs that are formed via back-splicing. These circRNAs are predominantly studied for their roles as regulators of various biological processes. Notably, emerging evidence demonstrates that host circRNAs can be differentially expressed (DE) upon infection with&#160;pathogens (e.g., influenza and coronaviruses), suggesting a role for circRNAs in regulating host innate immune responses. However, investigations on the role of circRNAs during pathogenic infections are limited by the knowledge and skills required to carry out the necessary bioinformatic analysis to identify DE circRNAs from RNA sequencing (RNA-seq) data. Bioinformatics prediction and identification of circRNAs is crucial before any verification, and functional studies using costly and time-consuming wet-lab techniques. To solve this issue, a step-by-step protocol of in silico prediction and characterization of circRNAs using RNA-seq data is provided in this manuscript. The protocol can be divided into four steps: 1) Prediction and quantification of DE circRNAs via the CIRIquant pipeline; 2) Annotation via circBase and characterization of DE circRNAs; 3) CircRNA-miRNA interaction prediction through Circr pipeline; 4) functional enrichment analysis of circRNA parental genes using Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG). This pipeline will be useful in driving future in vitro and in vivo research to further unravel the role of circRNAs in host-pathogen interactions.

The protocol enlisted in the previous section was modified and configured to suit the Linux OS system. The main reason is that most module libraries and packages involved in the analysis of circRNAs can only work on the Linux platform. In this analysis, de-identified ribosomal RNA (rRNA)-depleted RNA-seq library datasets prepared from the Influenza A virus-infected human macrophage cells were downloaded from the GEO database42 and used to generate the representative results.
CircRNA prediction and quantification 
In this analysis, ribosomal RNA (rRNA)-depleted RNA-seq library datasets prepared from the Influenza A virus infected human macrophage cells were used to carry out circRNA detection and functional analysis. As specified in the protocol section, CIRIquant was used to identify and carry out DE analysis of identified circRNAs using the RNA-seq library datasets as input. Reference files used are based on the latest human genome version (hg38). Table 4 shows an example of the final output from the CIRIquant analysis. The identification and filtering of DE circRNAs from CIRIquant output were executed through simple RStudio scripts (Supplementary File 1). CircRNAs are only classified as DE when the false-discovery rate (FDR) value is &#60;0.05 and log fold change (LogFC) &#62;|2|. Table 5 shows the total number of circRNAs and DE circRNAs detected. A total of 35,846 circRNAs were detected, with 306 being DE. The DE circRNAs detected in this output are entirely upregulated (LogFC &#62; 2), with none being downregulated (LogFC &#60; 2).
Annotation and characterization of DE circRNAs 
Annotation status of DE circRNAs 
DE circRNAs identified were cross-checked with an established circRNA database, circBase. However, since the circRNA coordinates deposited in circBase are based on a previous human genome version (hg19), the circRNA coordinates from circBase must be converted to the current human genome version (hg38) for cross-checking in this study. Additionally, the starting coordinate must be converted to 0-based from the 1-based output of CIRIquant. The hg38 version-converted circRNA coordinates of circBase is provided in a drive folder in Github (https://github.com/bicciatolab/Circr)37. Then, the Rstudio scripts (Supplementary File 1) were used to assign the annotation status of circRNAs in a new data frame column. Table 6 shows an example of circRNAs with the annotation status.
Characterization of DE circRNAs 
This part was entirely executed through R scripts in the RStudio software. R scripts ease the analytical processes, and only basic knowledge is required.
CircRNA types 
In this step, DE circRNAs were characterized by their circRNA types (Antisense, Exonic, Intergenic, and Intronic) based on their genomic positions. Table 7 below displays the percentage breakdown of the different circRNA types encompassed by the identified DE circRNAs. Of the total 306 DE circRNAs, 263 circRNAs (85.95%) were identified to be exonic circRNAs, which is the most abundant circRNA type identified. Intronic circRNAs come in as the second most identified circRNA type comprising 17 DE circRNAs, making up to 5.56% of the total DE circRNAs. This is followed by intergenic circRNAs (16 DE circRNAs ~5.23%) and antisense circRNAs (10 DE circRNAs ~3.27%).
Number of genes spanned per circRNA 
CircRNAs identified by CIRIquant can overlap across a number of genes. To date, most studies are focused on circRNAs that span one gene. Hence, in this protocol, the circRNA candidates spanning more than one gene are excluded from the downstream analysis. Table 8 below describes the number and percentage of DE circRNAs spanning one and more than one gene. In this table, intergenic circRNAs (16 DE circRNAs) are excluded since they do not overlap any host genes, while the rest of the circRNA types (290 DE circRNAs) are subjected to this analysis. Of the 290 DE circRNAs, the majority of the DE circRNAs (261 circRNAs ~90%) span only one gene, while the remaining 29 circRNAs (~10%) span more than one gene.
Construction of the ceRNA network 
A ceRNA network is usually drawn to visualize the circRNA-miRNA interactions after it has been predicted. In Figure 3 below, only one DE circRNA was chosen as a representative result, which is the hsa_DE_58 circRNA. Based on Circr predictions, hsa_DE_58 can sponge up to nine different miRNAs. These nine miRNAs are identified after filtering through stringent criteria.
Functional enrichment analysis 
GO and KEGG analysis of the circRNA parental genes 
Figure 4 below depicts a bubble plot of the functional enrichment of DE circRNA parental genes through the GO analysis. Fundamentally, the GO analysis aims to unravel the biological processes, cellular locations, and molecular functions that are enriched or impacted in the condition studied, in this case, the virus-infected sample. The enrichment is considered statistically significant and plotted on the bubble plot only if the p-value is &#60; 0.01. As shown in Figure 4, the top three enrichments for the biological processes (BP) include the ribonucleoprotein complex biogenesis, the response to virus, and the regulation of response to a biotic stimulus, while for the molecular functions (MF) only the catalytic activity acting on RNA and single-stranded RNA binding are statistically enriched. On the other hand, only the retromer complex is statistically enriched for the cellular components (CC).
Figure 5 shows the KEGG enrichment analysis of the DE circRNA parental genes in a bubble plot. Similar to the GO enrichment analysis, KEGG enrichment is only considered statistically significant and plotted on a bubble plot if the p-value is &#60; 0.01. Only two KEGG terms were enriched in this case, which are the Influenza A and viral life cycle (HIV-1) pathways.
<img alt="Figure 1" class="xfigimg" src="/files/ftp_upload/64565/64565fig01.jpg" /> 
Figure 1: The pipeline for the prediction and functional characterization of circRNAs. The pipeline shows a&#160;simple overview of the key steps from start to end involving the installation of the necessary software packages, predicting and quantifying the circRNA expression, construction of the ceRNA network, and performing the circRNA parental gene functional enrichment. <a href="https://www.jove.com/files/ftp_upload/64565/64565fig01large.jpg" target="_blank">Please click here to view a larger version of this figure.</a>
<img alt="Figure 2" class="xfigimg" src="/files/ftp_upload/64565/64565fig02.jpg" /> 
Figure 2: Folder tree structure for Circr. This folder tree structure is necessary to be established prior to running the Circr software in order to detect the required files for the analysis. <a href="https://www.jove.com/files/ftp_upload/64565/64565fig02large.jpg" target="_blank">Please click here to view a larger version of this figure.</a>
<img alt="Figure 3" class="xfigimg" src="/files/ftp_upload/64565/64565fig03.jpg" /> 
Figure 3: ceRNA network consisting of the circRNA-miRNA interaction. The blue oval shape represents the circRNA, while the orange triangles represent the miRNAs. The solid lines connecting the circRNA to miRNAs describe the potential miRNA sponging function of the hsa_DE_58 circRNA. <a href="https://www.jove.com/files/ftp_upload/64565/64565fig03large.jpg" target="_blank">Please click here to view a larger version of this figure.</a>
<img alt="Figure 4" class="xfigimg" src="/files/ftp_upload/64565/64565fig04.jpg" /> 
Figure 4: Bubble plot of GO enrichment analysis of DE circRNA parental genes. GeneRatio on the x-axis is the number of genes in the input list associated with the given GO term dividing the total number of input genes. The dot size in the plot is represented by the count value, which is the number of genes in the input list associated with the given GO term. The bigger the size of the dots, the larger the number of input genes associated with the term. Besides, the dots in the plot are color-coded based on the p-value. P-value is calculated by comparing the observed frequency of an annotation term with the frequency expected by chance. The individual terms are considered enriched beyond a cut-off value (p-value &#60; 0.01). The color gradient of p-value ranging from blue to red indicates increasing enrichment of the terms. <a href="https://www.jove.com/files/ftp_upload/64565/64565fig04large.jpg" target="_blank">Please click here to view a larger version of this figure.</a>
<img alt="Figure 5" class="xfigimg" src="/files/ftp_upload/64565/64565fig05.jpg" /> 
Figure 5: KEGG enrichment analysis of DE circRNA parental genes. GeneRatio on the x-axis is the number of genes in the input list associated with the given KEGG term dividing the total number of input genes. The dot size in the plot is represented by the count value, which is the number of genes in the input list associated with the given KEGG term. The bigger the size of the dots, the larger the number of input genes associated with the term. Besides, the dots in the plot are color-coded based on the p-value. P-value is calculated by comparing the observed frequency of an annotation term with the frequency expected by chance. Individual terms are considered enriched beyond a cut-off value (p-value &#60; 0.01). The color gradient of p-value ranging from blue to red indicates increasing enrichment of terms. <a href="https://www.jove.com/files/ftp_upload/64565/64565fig05large.jpg" target="_blank">Please click here to view a larger version of this figure.</a>
<table border="1" fo:keep-together.within-page="1" fo:keep-with-next.within-page="always">
	<tbody>
		<tr>
			<td>Sample name</td>
			<td>Path to CIRIquant output GTF file</td>
			<td>Grouping</td>
		</tr>
		<tr>
			<td>Control 1</td>
			<td>/path/to/CIRIquant/ctrl1.gtf</td>
			<td>C</td>
		</tr>
		<tr>
			<td>Control 2</td>
			<td>/path/to/CIRIquant/ctrl2.gtf</td>
			<td>C</td>
		</tr>
		<tr>
			<td>Infected 1</td>
			<td>/path/to/CIRIquant/infect1.gtf</td>
			<td>T</td>
		</tr>
		<tr>
			<td>Infected 2</td>
			<td>/path/to/CIRIquant/infect2.gtf</td>
			<td>T</td>
		</tr>
	</tbody>
</table>
Table 1: The .lst file preparation of CIRIquant. The destination paths of the control and treated samples from the CIRIquant output are written in a text file to compare the expressions of circRNA between the two types of samples.
<table border="1" fo:keep-together.within-page="1" fo:keep-with-next.within-page="always">
	<tbody>
		<tr>
			<td>Chr</td>
			<td>Start</td>
			<td>End</td>
			<td>Name</td>
			<td>.</td>
			<td>Strand</td>
		</tr>
		<tr>
			<td>chr2</td>
			<td>137428930</td>
			<td>137433876</td>
			<td>hsa_circ_000076</td>
			<td>.</td>
			<td>-</td>
		</tr>
		<tr>
			<td>chr2</td>
			<td>154705868</td>
			<td>154706632</td>
			<td>hsa_circ_000105</td>
			<td>.</td>
			<td>-</td>
		</tr>
		<tr>
			<td>chr2</td>
			<td>159104273</td>
			<td>159106793</td>
			<td>hsa_circ_000118</td>
			<td>.</td>
			<td>-</td>
		</tr>
		<tr>
			<td>chr2</td>
			<td>159215701</td>
			<td>159226125</td>
			<td>hsa_circ_000119</td>
			<td>.</td>
			<td>-</td>
		</tr>
		<tr>
			<td>chr4</td>
			<td>39980067</td>
			<td>39980129</td>
			<td>hsa_circ_002584</td>
			<td>.</td>
			<td>-</td>
		</tr>
	</tbody>
</table>
Table 2: Example BED file for Circr. Six columns (Chr, Start, End, Name, Gene, and Strand) associated with the circRNAs are required to generate the BED file.
<table border="1" fo:keep-together.within-page="1" fo:keep-with-next.within-page="always">
	<tbody>
		<tr>
			<td>circRNA_name</td>
			<td>Type</td>
			<td>miRNA_name</td>
			<td>Type</td>
		</tr>
		<tr>
			<td>DE_circRNA_1</td>
			<td>circRNA</td>
			<td>miR-001</td>
			<td>miRNA</td>
		</tr>
		<tr>
			<td>DE_circRNA_1</td>
			<td>circRNA</td>
			<td>miR-002</td>
			<td>miRNA</td>
		</tr>
		<tr>
			<td>DE_circRNA_2</td>
			<td>circRNA</td>
			<td>miR-003</td>
			<td>miRNA</td>
		</tr>
		<tr>
			<td>DE_circRNA_2</td>
			<td>circRNA</td>
			<td>miR-004</td>
			<td>miRNA</td>
		</tr>
	</tbody>
</table>
Table 3: Cytoscape input file. Four columns (circRNA_name, Type, miRNA_name, and Type) are required to be written into a text file.
<table border="1" fo:keep-together.within-page="1" fo:keep-with-next.within-page="always">
	<tbody>
		<tr>
			<td>CircRNA</td>
			<td>logFC</td>
			<td>logCPM</td>
			<td>LR</td>
			<td>Pvalue</td>
			<td>DE</td>
			<td>FDR</td>
		</tr>
		<tr>
			<td>chr4:17595410|17598558</td>
			<td>8.167934481</td>
			<td>-0.039318634</td>
			<td>185.5341965</td>
			<td>3.00E-42</td>
			<td>1</td>
			<td>1.08E-37</td>
		</tr>
		<tr>
			<td>chr16:18834892|18850467</td>
			<td>-3.955083482</td>
			<td>-4.397235736</td>
			<td>2.982607619</td>
			<td>0.08416358</td>
			<td>0</td>
			<td>0.282478158</td>
		</tr>
		<tr>
			<td>chr14:73198031|73211942</td>
			<td>2.493964729</td>
			<td>-4.448176684</td>
			<td>2.736442046</td>
			<td>0.09808293</td>
			<td>0</td>
			<td>0.282478158</td>
		</tr>
	</tbody>
</table>
Table 4: Part of the final output (.csv) file of CIRIquant. CIRIquant delivers information such as the LogFC, log counts per million (LogCPM), logistic regression (LR), p-value, differential expression, and FDR.
<table border="1" fo:keep-together.within-page="1" fo:keep-with-next.within-page="always">
	<tbody>
		<tr>
			<td colspan="4">CIRIquant results</td>
		</tr>
		<tr>
			<td>Total</td>
			<td>DE</td>
			<td>Up</td>
			<td>Down</td>
		</tr>
		<tr>
			<td>35846</td>
			<td>306</td>
			<td>306</td>
			<td>0</td>
		</tr>
	</tbody>
</table>
Table 5: A summary of the number of total and differentially expressed (DE) circRNAs identified. A total of 35,846 circRNAs are detected, with 306 being DE circRNAs. All the 306 DE circRNAs are upregulated (with none being downregulated) in treated samples when compared to control samples.
<table border="1" fo:keep-together.within-page="1" fo:keep-with-next.within-page="always">
	<tbody>
		<tr>
			<td>Custom_Name</td>
			<td>Annotation_Status</td>
		</tr>
		<tr>
			<td>hsa_DE_22</td>
			<td>Non-Annotated</td>
		</tr>
		<tr>
			<td>hsa_DE_2</td>
			<td>Annotated</td>
		</tr>
		<tr>
			<td>hsa_DE_58</td>
			<td>Non-Annotated</td>
		</tr>
		<tr>
			<td>hsa_DE_3</td>
			<td>Annotated</td>
		</tr>
	</tbody>
</table>
Table 6: Table of custom circRNA names with annotation status. CircRNAs are queried in a database of known deposited circRNAs (circBase). If the circRNA is present within the database, it is labeled to be annotated, while the absence of the circRNA is labeled as non-annotated.
<table border="1" fo:keep-together.within-page="1" fo:keep-with-next.within-page="always">
	<tbody>
		<tr>
			<td>CircRNA Type</td>
			<td>Freq</td>
			<td>Percentage</td>
		</tr>
		<tr>
			<td>antisense</td>
			<td>10</td>
			<td>3.27%</td>
		</tr>
		<tr>
			<td>exon</td>
			<td>263</td>
			<td>85.95%</td>
		</tr>
		<tr>
			<td>intergenic</td>
			<td>16</td>
			<td>5.23%</td>
		</tr>
		<tr>
			<td>intron</td>
			<td>17</td>
			<td>5.56%</td>
		</tr>
	</tbody>
</table>
Table 7: Types of circRNAs identified. CircRNAs can be further categorized into different types of circRNAs based on their sequence region, namely, exonic, intronic, antisense, and intergenic.
<table border="1" fo:keep-together.within-page="1" fo:keep-with-next.within-page="always">
	<tbody>
		<tr>
			<td>Number of Parental Genes</td>
			<td>Freq</td>
			<td>Percentage</td>
		</tr>
		<tr>
			<td>1</td>
			<td>261</td>
			<td>90%</td>
		</tr>
		<tr>
			<td>&#62; 1</td>
			<td>29</td>
			<td>10%</td>
		</tr>
	</tbody>
</table>
Table 8: Percentage of circRNAs with the different number of genes spanned. CircRNAs are commonly encoded from exons of one gene, but circRNAs spanning more than one gene can also be detected by CIRIquant.
Supplementary File 1: Scripts used in the protocol. <a href="https://www.jove.com/files/ftp_upload/64565/Supplementary File 1.docx" target="_blank">Please click here to download this File.</a>

Watch this Scientific Journal Video about In Silico Identification and Characterization of circRNAs During Host-Pathogen Interactions at JoVE.com

In Silico Identification and Characterization of circRNAs During Host-Pathogen Interactions

The protocol submitted here explains the complete in silico pipeline needed to predict and functionally characterize circRNAs from RNA-sequencing transcriptome data studying host-pathogen interactions.

In Silico Identification and Characterization of circRNAs During Host-Pathogen Interactions

Circular RNAs (circRNAs) are a class of non-coding RNAs that are formed via back-splicing. These circRNAs are predominantly studied for their roles as ...

in-silico-identification-characterization-circrnas-during-host

Nanyang Technological University

Research

JoVE Journal

Biology

1.5K Views.  Universiti Malaya. The protocol submitted here explains the complete in silico pipeline needed to predict and functionally characterize circRNAs from RNA-sequencing transcriptome data studying host-pathogen interactions.

Video: In Silico Identification and Characterization of circRNAs During Host-Pathogen Interactions

Genome-wide RNAi Screening to Identify Host Factors That Modulate Oncolytic Virus Therapy

High-throughput genome-wide RNAi (RNA interference) screening technology has been widely used for discovering host factors that impact&#160;virus replication. Here we present the application of this technology to uncovering host targets that specifically modulate the replication of Maraba virus, an oncolytic rhabdovirus, and vaccinia virus with the goal of enhancing therapy. While the protocol has been tested for use with oncolytic Maraba virus and oncolytic vaccinia virus, this approach is applicable to other oncolytic viruses and can also be utilized for identifying host targets that modulate virus replication in mammalian cells in general. This protocol describes the development and validation of an assay for high-throughput RNAi screening in mammalian cells, the key considerations and preparation steps important for conducting a primary high-throughput RNAi screen, and a step-by-step guide for conducting a primary high-throughput RNAi screen; in addition, it broadly outlines the methods for conducting secondary screen validation and tertiary validation studies. The benefit of high-throughput RNAi screening is that it allows one to catalogue, in an extensive and unbiased fashion, host factors that modulate any aspect of virus replication for which one can develop an in vitro assay such as infectivity, burst size, and cytotoxicity. It has the power to uncover biotherapeutic targets unforeseen based on current knowledge.

Here we describe a protocol for employing high-throughput RNAi screening to uncover host targets that can be manipulated to enhance oncolytic virus therapy, specifically rhabodvirus and vaccinia virus therapy, but it can be readily adapted to other oncolytic virus platforms or for discovering host genes that modulate virus replication generally.

High-throughput genome-wide RNAi (RNA interference) screening technology has been widely used for discovering host factors that impact&#160;virus ...

Cancer Research

A Complete Pipeline for Isolating and Sequencing MicroRNAs, and Analyzing Them Using Open Source Tools

Half of all human transcripts are thought to be regulated by microRNAs. Therefore, quantifying microRNA expression can reveal underlying mechanisms in disease states and provide therapeutic targets and biomarkers. Here, we detail how to accurately quantify microRNAs. Briefly, this method describes isolating microRNAs, ligating them to adaptors suitable for high-throughput sequencing, amplifying the final products, and preparing a sample library. Then, we explain how to align the obtained sequencing reads to microRNA hairpins, and quantify, normalize, and calculate their differential expression. Versatile and robust, this combined experimental workflow and bioinformatic analysis enables users to begin with tissue extraction and finish with microRNA quantification.

Here, we describe a step-by-step strategy for isolating small RNAs, enriching for microRNAs, and preparing samples for high-throughput sequencing. We then describe how to process sequence reads and align them to microRNAs, using open source tools.

Half of all human transcripts are thought to be regulated by microRNAs. Therefore, quantifying microRNA expression can reveal underlying mechanisms in ...

Genetics

Identification of Circular RNAs using RNA Sequencing

Circular RNAs (circRNAs) are a class of non-coding RNAs involved in functions including micro-RNA (miRNA) regulation, mediation of protein-protein interactions, and regulation of parental gene transcription. In classical next generation RNA sequencing (RNA-seq), circRNAs are typically overlooked as a result of poly-A selection during construction of mRNA libraries, or are found at very low abundance, and are therefore difficult to isolate and detect. Here, a circRNA library construction protocol was optimized by comparing library preparation kits, pre-treatment options and various total RNA input amounts. Two commercially available whole transcriptome library preparation kits, with and without RNase R pre-treatment, and using variable amounts of total RNA input (1 to 4 &#181;g), were tested. Lastly, multiple tissue types; including liver, lung, lymph node, and pancreas; as well as multiple brain regions; including the cerebellum, inferior parietal lobe, middle temporal gyrus, occipital cortex, and superior frontal gyrus; were compared to evaluate circRNA abundance across tissue types. Analysis of the generated RNA-seq data using six different circRNA detection tools (find_circ, CIRI, Mapsplice, KNIFE, DCC, and CIRCexplorer) revealed that a stranded total RNA library preparation kit with RNase R pre-treatment and 4 &#181;g RNA input is the optimal method for identifying the highest relative number of circRNAs. Consistent with previous findings, the highest enrichment of circRNAs was observed in brain tissues compared to other tissue types.

Circular RNAs (circRNAs) are non-coding RNAs that may have roles in transcriptional regulation and mediating interactions between proteins. Following assessment of different parameters for construction of circRNA sequencing libraries, a protocol was compiled utilizing stranded total RNA library preparation with RNase R pre-treatment and is presented here.

Circular RNAs (circRNAs) are a class of non-coding RNAs involved in functions including micro-RNA (miRNA) regulation, mediation of protein-protein ...

A Comparative Approach to Characterize the Landscape of Host-Pathogen Protein-Protein Interactions

Significant efforts were gathered to generate large-scale comprehensive protein-protein interaction network maps. This is instrumental to understand the pathogen-host relationships and was essentially performed by genetic screenings in yeast two-hybrid systems. The recent improvement of protein-protein interaction detection by a Gaussia luciferase-based fragment complementation assay now offers the opportunity to develop integrative comparative interactomic approaches necessary to rigorously compare interaction profiles of proteins from different pathogen strain variants against a common set of cellular factors.
This paper specifically focuses on the utility of combining two orthogonal methods to generate protein-protein interaction datasets: yeast two-hybrid (Y2H) and a new assay, high-throughput Gaussia princeps protein complementation assay (HT-GPCA) performed in mammalian cells.
A large-scale identification of cellular partners of a pathogen protein is performed by mating-based yeast two-hybrid screenings of cDNA libraries using multiple pathogen strain variants. A subset of interacting partners selected on a high-confidence statistical scoring is further validated in mammalian cells for pair-wise interactions with the whole set of pathogen variants proteins using HT-GPCA. This combination of two complementary methods improves the robustness of the interaction dataset, and allows the performance of a stringent comparative interaction analysis. Such comparative interactomics constitute a reliable and powerful strategy to decipher any pathogen-host interplays.

This article focuses on the identification of high-confident interaction datasets between host and pathogen proteins using a combination of two orthogonal methods: yeast two-hybrid followed by a high-throughput interaction assay in mammalian cells called HT-GPCA.

Significant efforts were gathered to generate  large-scale comprehensive protein-protein interaction network maps. This is  instrumental to understand the ...

Immunology and Infection

Screening and Identification of RNA Silencing Suppressors from Secreted Effectors of Plant Pathogens

RNA silencing is an evolutionarily conserved, sequence-specific gene regulation mechanism in eukaryotes. Several plant pathogens have evolved proteins with the ability to inhibit the host plant RNA silencing pathway. Unlike virus effector proteins, only several secreted effector proteins have showed the ability to suppress RNA silencing in bacterial, oomycete, and fungal pathogens, and the molecular functions of most effectors remain largely unknown. Here, we describe in detail a slightly modified version of the co-infiltration assay that could serve as a general method for observing RNA silencing and for characterizing effector proteins secreted by plant pathogens. The key steps of the approach are choosing the healthy and fully developed leaves, adjusting the bacteria culture to the appropriate optical density (OD) at 600 nm, and observing green fluorescent protein (GFP) fluorescence at the optimum time on the infiltrated leaves in order to avoid omitting effectors with weak suppression activity. This improved protocol will contribute to rapid, accurate, and extensive screening of RNA silencing suppressors and serve as an excellent starting point for investigating the molecular functions of these proteins.

Here, we present a modified screening method that can be extensively used to quickly screen RNA silencing suppressors in plant pathogens.

RNA silencing is an evolutionarily conserved, sequence-specific gene regulation mechanism in eukaryotes. Several plant pathogens have evolved proteins ...

Identification of RNAs Engaged in Direct RNA-RNA Interaction with a Long Non-Coding RNA

The growing role attributed nowadays to long non-coding RNAs (lncRNA) in physiology and pathophysiology makes it crucial to characterize their interactome by identifying their molecular partners, DNA, proteins and/or RNAs. The latter can interact with lncRNA through networks involving proteins, but they can also be engaged in direct RNA/RNA interactions. We, therefore, developed an easy-to-use RNA pull-down procedure that allowed identification of RNAs engaged in direct RNA/RNA interaction with a lncRNA using psoralen, a molecule that cross-links only RNA/RNA interactions. Bioinformatics modeling of the lncRNA secondary structure allowed the selection of several specific antisense DNA oligonucleotide probes with a strong affinity for regions displaying a low probability of internal base pairing. Since the specific probes that were designed targeted accessible regions throughout the length of the lncRNA, the RNA-interaction zones could be delineated in the sequence of the lncRNA. When coupled with a high throughput RNA sequencing, this protocol can be used for the whole direct RNA interactome studies of a lncRNA of interest.

An easy-to-use RNA pull-down protocol is designed for the identification of RNAs engaged in direct RNA/RNA interaction with a long non-coding RNA. The protocol uses psoralen as a fixative to cross-link only RNA/RNA interactions and provides the whole direct RNA interactome of a long non-coding RNA when coupled with RNA sequencing.

The growing role attributed nowadays to long non-coding RNAs (lncRNA) in physiology and pathophysiology makes it crucial to characterize their interactome ...

MS2-Affinity Purification Coupled with RNA Sequencing in Gram-Positive Bacteria

Although small regulatory RNAs (sRNAs) are widespread among the bacterial domain of life, the functions of many of them remain poorly characterized notably due to the difficulty of identifying their mRNA targets. Here, we described a modified protocol of the MS2-Affinity Purification coupled with RNA Sequencing (MAPS) technology, aiming to reveal all RNA partners of a specific sRNA in vivo. Broadly, the MS2 aptamer is fused to the 5&#8217; extremity of the sRNA of interest. This construct is then expressed in vivo, allowing the MS2-sRNA to interact with its cellular partners. After bacterial harvesting, cells are mechanically lysed. The crude extract is loaded into an amylose-based chromatography column previously coated with the MS2 protein fused to the maltose binding protein. This enables the specific capture of MS2-sRNA and interacting RNAs. After elution, co-purified RNAs are identified by high-throughput RNA sequencing and subsequent bioinformatic analysis. The following protocol has been implemented in the Gram-positive human pathogen Staphylococcus aureus and is, in principle, transposable to any Gram-positive bacteria. To sum up, MAPS technology constitutes an efficient method to deeply explore the regulatory network of a particular sRNA, offering a snapshot of its whole targetome. However, it is important to keep in mind that putative targets identified by MAPS still need to be validated by complementary experimental approaches.

MAPS technology has been developed to scrutinize the targetome of a specific regulatory RNA in vivo. The sRNA of interest is tagged with a MS2 aptamer enabling the co-purification of its RNA partners and their identification by RNA sequencing. This modified protocol is particularly suited for Gram-positive bacteria.&#160;

Although small regulatory RNAs (sRNAs) are widespread among the bacterial domain of life, the functions of many of them remain poorly characterized ...

Biochemistry

Arbovirus Infections As Screening Tools for the Identification of Viral Immunomodulators and Host Antiviral Factors

RNA interference- and genome editing-based screening platforms have been widely used to identify host cell factors that restrict virus replication. However, these screens are typically conducted in cells that are naturally permissive to the viral pathogen under study. Therefore, the robust replication of viruses in control conditions may limit the dynamic range of these screens. Furthermore, these screens may be unable to easily identify cellular defense pathways that restrict virus replication if the virus is well-adapted to the host and capable of countering antiviral defenses. In this article, we describe a new paradigm for exploring virus-host interactions through the use of screens that center on naturally abortive infections by arboviruses such as vesicular stomatitis virus (VSV). Despite the ability of VSV to replicate in a wide range of dipteran insect and mammalian hosts, VSV undergoes a post-entry, abortive infection in a variety of cell lines derived from lepidopteran insects, such as the gypsy moth (Lymantria dispar). However, these abortive VSV infections can be &#34;rescued&#34; when host cell antiviral defenses are compromised. We describe how VSV strains encoding convenient reporter genes and restrictive L. dispar cell lines can be paired to set-up screens to identify host factors involved in arbovirus restriction. Furthermore, we also show the utility of these screening tools in the identification of virally encoded factors that rescue VSV replication during coinfection or through ectopic expression, including those encoded by mammalian viruses. The natural restriction of VSV replication in L. dispar cells provides a high signal-to-noise ratio when screening for the conditions that promote VSV rescue, thus enabling the use of simplistic luminescence- and fluorescence-based assays to monitor the changes in VSV replication. These methodologies are valuable for understanding the interplay between host antiviral responses and viral immune evasion factors.

Here, we present the protocols to identify 1) virus-encoded immunomodulators that promote arbovirus replication and 2) eukaryotic host factors that restrict arbovirus replication. These fluorescence- and luminescence-based methods allow researchers to rapidly obtain quantitative readouts of arbovirus replication in simplistic assays with low signal-to-noise ratios.

RNA interference- and genome editing-based screening platforms have been widely used to identify host cell factors that restrict virus replication. ...

Label-Free Quantitative Proteomics Workflow for Discovery-Driven Host-Pathogen Interactions

The technological achievements of mass spectrometry (MS)-based quantitative proteomics opens many undiscovered avenues for analyzing an organism&#8217;s global proteome under varying conditions. This powerful strategy applied to the interactions of microbial pathogens with the desired host comprehensively characterizes both perspectives towards infection. Herein, the workflow describes label-free quantification (LFQ) of the infectome of Cryptococcus neoformans, a fungal facultative intracellular pathogen that is the causative agent of the deadly disease cryptococcosis, in the presence of immortalized macrophage cells. The protocol details the proper protein preparation techniques for both pathogen and mammalian cells within a single experiment, resulting in appropriate peptide submission for liquid-chromatography (LC)-MS/MS analysis. The high throughput generic nature of LFQ allows a wide dynamic range of protein identification and quantification, as well as transferability to any host-pathogen infection setting, maintaining extreme sensitivity. The method is optimized to catalogue extensive, unbiased protein abundance profiles of a pathogen within infection-mimicking conditions. Specifically, the method demonstrated here provides essential information on C. neoformans pathogenesis, such as protein production necessary for virulence and identifies critical host proteins responding to microbial invasion.

Here, we present a protocol to profile the interplay between host and pathogen during infection by mass spectrometry-based proteomics. This protocol uses label-free quantification to measure changes in protein abundance of both host (e.g., macrophages) and pathogen (e.g., Cryptococcus neoformans) in a single experiment.

The technological achievements of mass spectrometry (MS)-based quantitative proteomics opens many undiscovered avenues for analyzing an organism&#8217;s ...

High-Throughput Transcriptome Analysis for Investigating Host-Pathogen Interactions

Pathogens can cause a wide variety of infectious diseases. The biological processes induced by the host in response to infection determine the severity of the disease.&#160;To study such processes, researchers can use high-throughput sequencing techniques (RNA-seq) that measure the dynamic changes of the host transcriptome at different stages of infection, clinical outcomes, or disease severity.This investigation can lead to a better understanding of the diseases, as well as uncovering potential drug targets and treatments. The protocol presented here describes a complete pipeline to analyze RNA-sequencing data from raw reads to functional analysis. The pipeline is divided into five steps: (1) quality control of the data; (2) mapping and annotation of genes; (3) statistical analysis to identify differentially expressed genes and co-expressed genes; (4) determination of the molecular degree of the perturbation of samples; and (5) functional analysis. Step 1 removes technical artifacts that may impact the quality of downstream analyses. In step 2, genes are mapped and annotated according to standard library protocols.&#160;The statistical analysis in step 3 identifies genes that are differentially expressed or co-expressed in infected samples, in comparison with non-infected ones. Sample variability and the presence of potential biological outliers are verified using the molecular degree of perturbation approach in step 4. Finally, the functional analysis in step 5 reveals the pathways associated with the disease phenotype. The presented pipeline aims to support researchers through the RNA-seq data analysis from host-pathogen interaction studies and drive future&#160;in vitro or in vivo experiments, that are essential to understand the molecular mechanism of infections.

The protocol presented here describes a complete pipeline to analyze RNA-sequencing transcriptome data from raw reads to functional analysis, including quality control and preprocessing steps to advanced statistical analytical approaches.

Pathogens can cause a wide variety of infectious diseases. The biological processes induced by the host in response to infection determine the severity of ...

In Silico Identification and Characterization of circRNAs During Host-Pathogen Interactions

Summary

Explore More Videos

Genome-wide RNAi Screening to Identify Host Factors That Modulate Oncolytic Virus Therapy

A Complete Pipeline for Isolating and Sequencing MicroRNAs, and Analyzing Them Using Open Source Tools

Identification of Circular RNAs using RNA Sequencing

A Comparative Approach to Characterize the Landscape of Host-Pathogen Protein-Protein Interactions

Screening and Identification of RNA Silencing Suppressors from Secreted Effectors of Plant Pathogens

Identification of RNAs Engaged in Direct RNA-RNA Interaction with a Long Non-Coding RNA

MS2-Affinity Purification Coupled with RNA Sequencing in Gram-Positive Bacteria

Arbovirus Infections As Screening Tools for the Identification of Viral Immunomodulators and Host Antiviral Factors

Label-Free Quantitative Proteomics Workflow for Discovery-Driven Host-Pathogen Interactions

High-Throughput Transcriptome Analysis for Investigating Host-Pathogen Interactions