A Bioinformatics Pipeline for Investigating Molecular Evolution and Gene Expression using RNA-seq

Aide Macias-Muñoz; Ali Mortazavi

doi:10.3791/61633

A subscription to JoVE is required to view this content. Sign in or start your free trial.

Summary

The purpose of this protocol is to investigate the evolution and expression of candidate genes using RNA sequencing data.

Abstract

Distilling and reporting large datasets, such as whole genome or transcriptome data, is often a daunting task. One way to break down results is to focus on one or more gene families that are significant to the organism and study. In this protocol, we outline bioinformatic steps to generate a phylogeny and to quantify the expression of genes of interest. Phylogenetic trees can give insight into how genes are evolving within and between species as well as reveal orthology. These results can be enhanced using RNA-seq data to compare the expression of these genes in different individuals or tissues. Studies of molecular evolution and expression can reveal modes of evolution and conservation of gene function between species. The characterization of a gene family can serve as a springboard for future studies and can highlight an important gene family in a new genome or transcriptome paper.

Introduction

Advances in sequencing technologies have facilitated the sequencing of genomes and transcriptomes of non-model organisms. In addition to the increased feasibility of sequencing DNA and RNA from many organisms, an abundance of data is publicly available to study genes of interest. The purpose of this protocol is to provide bioinformatic steps to investigate the molecular evolution and expression of genes that may play an important role in the organism of interest.

Investigating the evolution of a gene or gene family can provide insight into the evolution of biological systems. Members of a gene family are typically determined by identifying conserved motifs or homologous gene sequences. Gene family evolution was previously investigated using genomes from distantly related model organisms¹. A limitation to this approach is that it is not clear how these gene families evolve in closely related species and the role of different environmental selective pressures. In this protocol, we include a search for homologs in closely related species. By generating a phylogeny at a phylum level, we can note trends in gene family evolution such as that of conserved genes or lineage-specific duplications. At this level, we can also investigate whether genes are orthologs or paralogs. While many homologs likely function similarly to each other, that is not necessarily the case². Incorporating phylogenetic trees in these studies is important to resolve whether these homologous genes are orthologs or not. In eukaryotes, many orthologs retain similar functions within the cell as evidenced by the ability of mammalian proteins to restore the function of yeast orthologs³. However, there are instances where a non-orthologous gene carries out a characterized function⁴.

Phylogenetic trees begin to delineate relationships between genes and species, yet function cannot be assigned solely based on genetic relationships. Gene expression studies combined with functional annotations and enrichment analysis provide strong support for gene function. Cases where gene expression can be quantified and compared across individuals or tissue types can be more telling of potential function. The following protocol follows methods used in investigating opsin genes in Hydra vulgaris⁷, but they can be applied to any species and any gene family. The results of such studies provide a foundation for further investigation into gene function and gene networks in non-model organisms. As an example, the investigation of the phylogeny of opsins, which are proteins that initiate the phototransduction cascade, gives context to the evolution of eyes and light detection⁸^,⁹^,¹⁰^,¹¹. In this case, non-model organisms especially basal animal species such as cnidarians or ctenophores can elucidate conservation or changes in the phototransduction cascade and vision across clades¹²^,¹³^,¹⁴. Similarly, determining the phylogeny, expression, and networks of other gene families will inform us about the molecular mechanisms underlying adaptations.

Protocol

This protocol follows UC Irvine animal care guidelines.

1. RNA-seq library preparation

Isolate RNA using the following methods.
1. Collect samples. If RNA is to be extracted at a later time, flash freeze the sample or place in RNA storage solution¹⁵ (Table of Materials).
2. Euthanize and dissect the organism to separate tissues of interest.
3. Extract total RNA using an extraction kit and purify the RNA using an RNA purification kit (Table of Materials)
  NOTE: There are protocols and kits that may work better for different species and tissue types¹⁶^,¹⁷. We have extracted RNA from different body tissues of a butterfly¹⁸ and a gelatinous Hydra¹⁹ (see discussion).
4. Measure the concentration and quality of the RNA of each sample (Table of Materials). Use samples with RNA integrity numbers (RIN) higher than 8, ideally closer to 9²⁰ to construct cDNA libraries.
Construct cDNA library and sequence as follows.
1. Build cDNA libraries according to the library prep instruction manual (see discussion).
2. Determine cDNA concentration and quality (Table of Materials).
3. Multiplex the libraries and sequence them.

2. Access a computer cluster

NOTE: RNA-seq analysis requires manipulation of large files and is best done on a computer cluster (Table of Materials).

Login to the computer cluster account using the command ssh username@clusterlocation on a terminal (Mac) or PuTTY (Windows) application window.

3. Obtain RNA-seq reads

Obtain RNA-seq reads from the sequencing facility or, for data generated in a publication, from the data repository where it was deposited (3.2 or 3.3).
To download data from repositories such as ArrayExpress do the following:
1. Search the site using the accession number.
2. Find the link to download the data, then left-click and select Copy Link.
3. On the terminal window, type wget and select Paste link to copy the data to the directory for analysis.
To download NCBI Short Read Archive (SRA) data follow these alternative steps:
1. On the terminal download SRA Toolkit v. 2.8.1 using wget.
  NOTE: Downloading and installing programs to the computer cluster may require root access, contact your computer cluster administrator if installation fails.
2. Finish installing the program by typing tar -xvf $TARGZFILE.
3. Search NCBI for the SRA accession number for the samples you want to download, it should have the format SRRXXXXXX.
4. Obtain the RNA-seq data by typing [sratoolkit location]/bin/prefetch SRRXXXXXX in the terminal window.
5. For paired-end files type [sratoolkit location]/bin/fastq-dump --split-files SRRXXXXXX to get two fastq files (SRRXXXXXX_1.FASTQ and SRRXXXXXX_2.FASTQ).
  NOTE: To do a Trinity de novo assembly use the command [sratoolkit location]/bin/fastq-dump --defline-seq '@$sn[_$rn]/$ri' --split-files SRRXXXXXX

4. Trim adapters and low-quality reads (optional)

Install or load Trimmomatic²¹ v. 0.35 on the computing cluster.
In the directory where the RNA-seq data files are located, type a command that includes the location of the trimmomatic jar file, the input FASTQ files, output FASTQ files, and optional parameters such as read length and quality.
NOTE: The command will vary by the raw and desired quality and length of the reads. For Illumina 43 bp reads with Nextera primers, we used: java -jar /data/apps/trimmomatic/0.35/trimmomatic-0.35.jar PE $READ1.FASTQ $READ2.FASTQ paired_READ1.FASTQ unpaired_READ1.FASTQ paired_READ2.FASTQ unpaired_READ2.FASTQ ILLUMINACLIP:adapters.fa:2:30:10 LEADING:20 TRAILING:20 SLIDINGWINDOW:4:17 MINLEN:30.

5. Obtain reference assembly

Search google, EnsemblGenomes, and NCBI Genomes and Nucleotide TSA (Transcriptome Shotgun Assembly) for a reference genome or assembled transcriptome for the species of interest (Figure 1).
NOTE: If a reference genome or transcriptome are not available or low-quality, proceed to STEP 6 to generate a de novo assembly.
If a reference genome or assembled transcriptome exists, download it as a fasta file to where the analysis will be performed following the steps below.
1. Find the link to download the genome, left-click and Copy Link.
2. On the terminal window type wget and paste the link address. If available, also copy the GTF file and protein FASTA file for the reference genome.

6. Generate a de novo assembly (Alternative to Step 5)

Combine the RNA-seq READ1 and READ2 fastq files for all samples by typing cat *READ1.FASTQ > $all_READ1.FASTQ and cat *READ2.FASTQ > all_READ2.FASTQ on the terminal window.
Install or load Trinity²² v.2.8.5 on the computing cluster.
Generate and assembly by typing on the terminal: Trinity --seqType fq --max_memory 20G --left $all_READ1.FASTQ --right $all_READ2.FASTQ.

7. Map reads to the genome (7.1) or de novo transcriptome (7.2)

Map reads to the reference genome using STAR²³ v. 2.6.0c and RSEM²⁴ v. 1.3.0.
1. Install or load STAR v. 2.6.0c. and RSEM v. 1.3.0 to the computing cluster.
2. Index the genome by typing rsem-prepare-reference --gtf $GENOME.GTF --star -p 16 $GENOME.FASTA $OUTPUT.
3. Map reads and calculate expression for each sample by typing rsem-calculate-expression -p 16 --star --paired-end $READ1.FASTQ $READ2.FASTQ $INDEX $OUTPUT.
4. Rename the results file to something descriptive using mv RSEM.genes.results $sample.genes.results.
5. Generate a matrix of all counts by typing rsem-generate-data-matrix *[genes/isoforms.results] > $OUTPUT.
Map RNA-seq to the Trinity de novo assembly using RSEM and bowtie.
1. Install or load Trinity²² v.2.8.5, Bowtie²⁵ v. 1.0.0, and RSEM v. 1.3.0.
2. Map reads and calculate expression for each sample by typing [trinity_location]/align_and_estimate_abundance.pl --prep-reference --transcripts $TRINITY.FASTA --seqType fq --left $READ1.FASTQ --right $READ2.FASTQ --est_method RSEM --aln_method bowtie --trinity_mode --output_dir $OUTPUT.
3. Rename the results file to something descriptive using mv RSEM.genes.results $sample.genes.results.
4. Generate a matrix of all counts by typing [trinity_location]/abundance_estimates_to_matrix.pl --est_method RSEM *[genes/isoforms].results

8. Identify genes of interest

NOTE: The following steps can be done with nucleotide or protein FASTA files but work best and are more straightforward with protein sequences. BLAST searches using protein to protein is more likely to give results when searching between different species.

For a reference genome, use the protein FASTA file from STEP 5.2.2 or see Supplemental Materials to generate a custom gene feature GTF.
For a de novo transcriptome, generate a protein FASTA using TransDecoder.
1. Install or load TransDecoder v. 5.5.0 on the computer cluser.
2. Find the longest open reading frame and predicted peptide sequence by typing [Transdecoder location]/TransDecoder.LongOrfs -t $TRINITY.FASTA.
Search NCBI Genbank for homologs in closely related species.
1. Open an internet browser window and go to https://www.ncbi.nlm.nih.gov/genbank/.
2. On the search bar type the name of the gene of interest and the name of closely related species which have been sequenced or genus or phylum. On the left of the search bar select protein then click search.
3. Extract sequences by clicking Send to and then select File. Under Format, select FASTA then click Create File.
4. Move FASTA file of homologs to the computer cluster by typing scp $FASTA username@clusterlocation:/$DIR on a local terminal window or use FileZilla to transfer files to and from computer and cluster.
Search for candidate genes using BLAST+²⁶.
1. Install or load BLAST+ v. 2.8.1 on the computer cluster.
2. On the computer cluster, make a BLAST database from the genome or transcriptome translated protein FASTA by typing [BLAST+ location]/makeblastdb -in $PEP.FASTA -dbtype prot -out $OUTPUT
3. BLAST the homologous gene sequences from NCBI to the database of the species of interest by typing [BLAST+ location]/blastp -db $DATABASE -query $FASTA -evalue 1e-10 -outfmt 6 -max_target_seqs 1 -out $OUTPUT.
4. View the output file using the command more. Copy unique gene IDs from the species of interest to a new text file.
5. Extract the sequences of candidate genes by typing perl -ne 'if(/^>(\S+)/){$c=$i{$1}}$c?print:chomp;$i{$_}=1 if @ARGV' $gene_id.txt $PEP.FASTA > $OUTPUT.
Confirm gene annotation using reciprocal BLAST.
1. On the internet browser go to https://blast.ncbi.nlm.nih.gov/Blast.cgi.
2. Select tblastn, then paste the candidate sequences, select the Non-redundant protein sequence database and click BLAST.
Identify additional genes by annotating all genes in the genome or transcriptome with gene ontology (GO) terms (see discussion).
1. Transfer the protein FASTA to the local computer.
2. Download and install Blast2GO²⁷^,²⁸^,²⁹ v. 5.2 to the local computer.
3. Open Blast2GO, click File, go to Load, go to Load Sequences, click Load Fasta File (fasta). Select the FASTA file and click Load.
4. Click on Blast, choose NCBI Blast, and click Next. Edit parameters or click Next, edit parameters and click Run to find the most similar gene description.
5. Click mapping then click Run to search Gene Ontology annotations for similar proteins.
6. Next click interpro, select EMBL-EBI InterPro, and click Next. Edit parameters or click Next, and click Run to search for signatures of known gene families and domains.
7. Export the annotations by clicking File, select Export, click Export Table. Click Browse, name the file, click Save, click Export.
8. Search the annotation table for GO terms of interest to identify additional candidate genes. Extract the sequences from the FASTA file (STEP 8.4.5)

9. Phylogenetic trees

Download and install MEGA³⁰ v. 7.0.26 to your local computer.
Open MEGA, click on Align, click Edit/Build Alignment, select Create a new alignment click OK, select Protein.
When the alignment window opens, click on Edit, click Insert sequences from file and select the FASTA with protein sequences of candidate genes and probable homologs.
Select all sequences. Find the arm symbol and hover over it. It should say Align sequences using MUSCLE³¹ algorithm. Click on the arm symbol and then click Align Protein to align the sequences. Edit parameters or click OK to align using default parameters.
Visually inspect and make any manual changes then Save and close the alignment window.
In the main MEGA window, click on Models, click Find Best DNA/Protein models (ML), select the alignment file and select corresponding parameters such as: Analysis: Model Selection (ML), Tree to use: Automatic (neighbor-joining tree), Statistical Method: Maximum Likelihood, Substitution Type: Amino Acid, Gap/missing data treatment: Use all sites, Branch site filter: None.
Once the best model for the data is determined, go to the main MEGA window. Click Phylogeny and click Contruct/Test Maximum Likelihood Tree and then select the alignment, if necessary. Select the appropriate parameters for the tree: Statistical method: Maximum Likelihood, Test of Phylogeny: Bootstrap method with 100 replicates, substitution type: amino acid, model: LG with Freqs. (+F), rates among sites: gamma distributed (G) with 5 discrete gamma categories, gap/missing data treatment: use all sites, ML heuristic method: Nearest-Neighbor-Interchange (NNI).

10. Visualize gene expression using TPM

For Trinity, on the computer cluster go to the directory where abundance_estimates_to_matrix.pl was run and one of the outputs should be matrix.TPM.not_cross_norm. Transfer this file to your local computer.
NOTE: See Supplemental Materials for cross sample normalization.
For TPMs from a genome analysis follow the steps below.
1. On the computer cluster, go to the RSEM installation location. Copy rsem-generate-data-matrix by typing scp rsem-generate-data-matrix rsem-generate-TPM-matrix. Use nano to edit the new file and change “my $offsite = 4” from 4 to 5 for TPM, it should now read “my $offsite = 5”.
Go to the directory where the RSEM output files .genes.results are and now use rsem-generate-TPM-matrix *[genes/isoforms.results] > $OUTPUT to generate a TPM matrix. Transfer results to a local computer.
Visualize the results in ggplot2.
1. Download R v. 4.0.0 and RStudio v. 1.2.1335 to a local computer.
2. Open RStudio on the right of the screen go to the Packages tab and click Install. Type ggplot2 and click install.
3. On the R script window read in the TPM table by typing data<-read.table("$tpm.txt",header = T)
4. For bar graphs similar to Figure 4 type something similar to: p<- ggplot() + geom_bar(aes(y=TPM, x=Symbol, fill=Tissue), data=data, stat="identity")
  fill<-c("#d7191c","#fdae61", "#ffffbf", "#abd9e9", "#2c7bb6")
  p<-p+scale_fill_manual(values=fill)
  p + theme(axis.text.x = element_text(angle = 90))

Results

The methods above are summarized in Figure 1 and were applied to a data set of Hydra vulgaris tissues. H. vulgaris is a fresh-water invertebrate that belongs to the phylum Cnidaria which also includes corals, jellyfish, and sea anemones. H. vulgaris can reproduce asexually by budding and they can regenerate their head and foot when bisected. In this study, we aimed to investigate the evolution and expression of opsin genes in Hydra

Discussion

The purpose of this protocol is to provide an outline of the steps for characterizing a gene family using RNA-seq data. These methods have been proven to work for a variety of species and datasets⁴^,³⁴^,³⁵. The pipeline established here has been simplified and should be easy enough to be followed by a novice in bioinformatics. The significance of the protocol is that it outlines all the steps and necessary programs to complete a p...

Disclosures

The authors have nothing to disclose.

Acknowledgements

We thank Adriana Briscoe, Gil Smith, Rabi Murad and Aline G. Rangel for advice and guidance in incorporating some of these steps into our workflow. We are also grateful to Katherine Williams, Elisabeth Rebboah, and Natasha Picciani for comments on the manuscript. This work was supported in part by a George E. Hewitt Foundation for Medical research fellowship to A.M.M.

Materials

Name	Company	Catalog Number	Comments
Bioanalyzer-DNA kit	Agilent	5067-4626	wet lab materials
Bioanalyzer-RNA kit	Agilent	5067-1513	wet lab materials
BLAST+ v. 2.8.1			On computer cluster* https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
Blast2GO (on your PC)			On local computer https://www.blast2go.com/b2g-register-basic
boost v. 1.57.0			On computer cluster
Bowtie v. 1.0.0			On computer cluster https://sourceforge.net/projects/bowtie-bio/files/bowtie/1.3.0/
Computing cluster (highly recommended)			NOTE: Analyses of genomic data are best done on a high-performance computing cluster because files are very large.
Cufflinks v. 2.2.1			On computer cluster
edgeR v. 3.26.8 (in R)			In Rstudio https://bioconductor.org/packages/release/bioc/html/edgeR.html
gcc v. 6.4.0			On computer cluster
Java v. 11.0.2			On computer cluster
MEGA7 (on your PC)			On local computer https://www.megasoftware.net
MEGAX v. 0.1			On local computer https://www.megasoftware.net
NucleoSpin RNA II kit	Macherey-Nagel	740955.5	wet lab materials
perl 5.30.3			On computer cluster
python			On computer cluster
Qubit 2.0 Fluorometer	ThermoFisher	Q32866	wet lab materials
R v.4.0.0			On computer cluster https://cran.r-project.org/src/base/R-4/
RNAlater	ThermoFisher	AM7021	wet lab materials
RNeasy kit	Qiagen	74104	wet lab materials
RSEM v. 1.3.0			Computer software https://deweylab.github.io/RSEM/
RStudio v. 1.2.1335			On local computer https://rstudio.com/products/rstudio/download/#download
Samtools v. 1.3			Computer software
SRA Toolkit v. 2.8.1			On computer cluster https://github.com/ncbi/sra-tools/wiki/01.-Downloading-SRA-Toolkit
STAR v. 2.6.0c			On computer cluster https://github.com/alexdobin/STAR
StringTie v. 1.3.4d			On computer cluster https://ccb.jhu.edu/software/stringtie/
Transdecoder v. 5.5.0			On computer cluster https://github.com/TransDecoder/TransDecoder/releases
Trimmomatic v. 0.35			On computer cluster http://www.usadellab.org/cms/?page=trimmomatic
Trinity v.2.8.5			On computer cluster https://github.com/trinityrnaseq/trinityrnaseq/releases
TRIzol	ThermoFisher	15596018	wet lab materials
TruSeq RNA Library Prep Kit v2	Illumina	RS-122-2001	wet lab materials
TURBO DNA-free Kit	ThermoFisher	AM1907	wet lab materials

*Downloads and installation on the computer cluster may require root access. Contact your network administrator.

References

Lespinet, O., Wolf, Y. I., Koonin, E. V., Aravind, L. The role of lineage-specific gene family expansion in the evolution of eukaryotes. Genome Research. 12 (7), 1048-1059 (2002).
Gabaldón, T., Koonin, E. V. Functional and evolutionary implications of gene orthology. Nature Reviews Genetics. 14 (5), 360-366 (2013).
Dolinski, K., Botstein, D. Orthology and Functional Conservation in Eukaryotes. Annual Review of Genetics. 41 (1), (2007).
Macias-Muñoz, A., McCulloch, K. J., Briscoe, A. D. Copy number variation and expression analysis reveals a non-orthologous pinta gene family member involved in butterfly vision. Genome Biology and Evolution. 9 (12), 3398-3412 (2017).
Cannon, S. B., Mitra, A., Baumgarten, A., Young, N. D., May, G. The roles of segmental and tandem gene duplication in the evolution of large gene families in Arabidopsis thaliana. BMC plant biology. 4, 10 (2004).
Eastman, S. D., Chen, T. H. P., Falk, M. M., Mendelson, T. C., Iovine, M. K. Phylogenetic analysis of three complete gap junction gene families reveals lineage-specific duplications and highly supported gene classes. Genomics. 87 (2), 265-274 (2006).
Macias-Munõz, A., Murad, R., Mortazavi, A. Molecular evolution and expression of opsin genes in Hydra vulgaris. BMC Genomics. 20 (1), 1-19 (2019).
Hisatomi, O., Tokunaga, F. Molecular evolution of proteins involved in vertebrate phototransduction. Comparative Biochemistry and Physiology - B Biochemistry and Molecular Biology. 133 (4), 509-522 (2002).
Arendt, D. Evolution of eyes and photoreceptor cell types. International Journal of Developmental Biology. 47, 563-571 (2003).
Shichida, Y., Matsuyama, T. Evolution of opsins and phototransduction. Philosophical Transactions of the Royal Society B: Biological Sciences. 364 (1531), 2881-2895 (2009).
Porter, M. L., et al. Shedding new light on opsin evolution. Proceedings of the Royal Society B: Biological Sciences. 279 (1726), 3-14 (2012).
Plachetzki, D. C., Degnan, B. M., Oakley, T. H. The origins of novel protein interactions during animal opsin evolution. PLoS ONE. 2 (10), 1054 (2007).
Ramirez, M. D., et al. The last common ancestor of most bilaterian animals possessed at least nine opsins. Genome Biology and Evolution. 8 (12), 3640-3652 (2016).
Schnitzler, C. E., et al. Genomic organization, evolution, and expression of photoprotein and opsin genes in Mnemiopsis leidyi: a new view of ctenophore photocytes. BMC Biology. 10, 107 (2012).
Pedersen, K. B., Williams, A., Watt, J., Ronis, M. J. Improved method for isolating high-quality RNA from mouse bone with RNAlater at room temperature. Bone Reports. 11, 100211 (2019).
Ridgeway, J. A., Timm, A. E., Fallon, A. Comparison of RNA isolation methods from insect larvae. Journal of Insect Science. 14 (1), 4-8 (2014).
Scholes, A. N., Lewis, J. A. Comparison of RNA isolation methods on RNA-Seq: Implications for differential expression and meta-Analyses. BMC Genomics. 21 (1), 1-9 (2020).
Briscoe, A. D., et al. Female behaviour drives expression and evolution of gustatory receptors in butterflies. PLoS genetics. 9 (7), 1003620 (2013).
Murad, R., Macias-Muñoz, A., Wong, A., Ma, X., Mortazavi, A. Integrative analysis of Hydra head regeneration reveals activation of distal enhancer-like elements. bioRxiv. , 544049 (2019).
Gallego Romero, I., Pai, A. A., Tung, J., Gilad, Y. Impact of RNA degradation on measurements of gene expression. BMC Biology. 12, 42 (2014).
Bolger, A. M., Lohse, M., Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 30 (15), 2114-2120 (2014).
Trinity. . RNA-Seq De novo Assembly Using Trinity. , 1-7 (2014).
Dobin, A., et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 29, 15-21 (2013).
Li, B., Dewey, C. N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC bioinformatics. 12, 323 (2011).
Langmead, B., Trapnell, C., Pop, M., Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome biology. 10, 25 (2009).
Camacho, C., et al. BLAST+: architecture and applications. BMC Bioinformatics. 10, 421 (2009).
Conesa, A., Götz, S. Blast2GO: A comprehensive suite for functional analysis in plant genomics. International Journal of Plant Genomics. 619832, (2008).
Conesa, A., et al. Blast2GO: A universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 21 (18), 3674-3676 (2005).
Götz, S., et al. High-throughput functional annotation and data mining with the Blast2GO suite. Nucleic Acids Research. 36 (10), 3420-3435 (2008).
Kumar, S., Stecher, G., Tamura, K. MEGA7: Molecular Evolutionary Genetics Analysis version 7.0 for bigger datasets. Molecular biology and evolution. 33 (7), 1870-1874 (2016).
Edgar, R. C. MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research. 32 (5), 1792-1797 (2004).
Taddei-Ferretti, C., Musio, C., Santillo, S., Cotugno, A. The photobiology of Hydra's periodic activity. Hydrobiologia. 530, 129-134 (2004).
Chapman, J. A., et al. The dynamic genome of Hydra. Nature. 464 (7288), 592-596 (2010).
Macias-Muñoz, A., Rangel Olguin, A. G., Briscoe, A. D. Evolution of phototransduction genes in Lepidoptera. Genome Biology and Evolution. 11 (8), 2107-2124 (2019).
Macias-Munõz, A., Murad, R., Mortazavi, A. Molecular evolution and expression of opsin genes in Hydra vulgaris. BMC Genomics. 20 (1), (2019).
Picelli, S., et al. Full-length RNA-seq from single cells using Smart-seq2. Nature Protocols. 9 (1), 171-181 (2014).
Tavares, L., Alves, P. M., Ferreira, R. B., Santos, C. N. Comparison of different methods for DNA-free RNA isolation from SK-N-MC neuroblastoma. BMC research notes. 4, 3 (2011).
Johnson, M. T. J., et al. Evaluating Methods for Isolating Total RNA and Predicting the Success of Sequencing Phylogenetically Diverse Plant Transcriptomes. PLoS ONE. 7 (11), (2012).
Zhao, S., Zhang, Y., Gamini, R., Zhang, B., Von Schack, D. Evaluation of two main RNA-seq approaches for gene quantification in clinical RNA sequencing: PolyA+ selection versus rRNA depletion. Scientific Reports. 8 (1), 1-12 (2018).
Zhao, S., et al. Comparison of stranded and non-stranded RNA-seq transcriptome profiling and investigation of gene overlap. BMC Genomics. 16 (1), 1-14 (2015).
Corley, S. M., MacKenzie, K. L., Beverdam, A., Roddam, L. F., Wilkins, M. R. Differentially expressed genes from RNA-Seq and functional enrichment results are affected by the choice of single-end versus paired-end reads and stranded versus non-stranded protocols. BMC Genomics. 18 (1), 1-13 (2017).
Haas, B. J., et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nature Protocols. 8 (8), 1494-1512 (2013).
Pertea, M., et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nature biotechnology. 33 (3), 290-295 (2015).
Bray, N. L., Pimentel, H., Melsted, P., Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nature Biotechnology. 34 (5), 525-527 (2016).
Patro, R., Duggal, G., Love, M. I., Irizarry, R. A., Kingsford, C. Salmon provides fast and bias-aware quantification of transcript expression. Nature Methods. 14 (4), 417-419 (2017).
Araujo, F. A., Barh, D., Silva, A., Guimarães, L., Thiago, R. . OPEN GO FEAT a rapid web-based functional annotation tool for genomic and transcriptomic data. , 8-11 (2018).
Huerta-Cepas, J., et al. Fast genome-wide functional annotation through orthology assignment by eggNOG-mapper. Molecular Biology and Evolution. 34 (8), 2115-2122 (2017).
Huerta-Cepas, J., et al. EggNOG 5.0: A hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Research. 47, 309-314 (2019).
Törönen, P., Medlar, A., Holm, L. PANNZER2: A rapid functional annotation web server. Nucleic Acids Research. 46, 84-88 (2018).
Robinson, M., Mccarthy, D., Chen, Y., Smyth, G. K. . edgeR differential expression analysis of digital gene expression data User's Guide. , (2013).
Huang, D. W., Sherman, B. T., Lempicki, R. A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nature Protocols. 4 (1), 44-57 (2009).
Huang, D. W., Sherman, B. T., Lempicki, R. A. Bioinformatics enrichment tools: Paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Research. 37 (1), 1-13 (2009).
Letunic, I., Bork, P. Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees. Nucleic acids research. 44, 242-245 (2016).

Reprints and Permissions

Request permission to reuse the text or figures of this JoVE article

Request Permission

Explore More Articles

Bioinformatics Pipeline Molecular Evolution Gene Expression RNA seq Shell Scripts SRA Toolkit FASTQ Files Reference Genome GTF File Protein FASTA File Gene Annotation BLAST Database Homologous Genes NCBI GenBank Reciprocal BLAST MEGA Software

This article has been published

Video Coming Soon

Keep me updated: