A subscription to JoVE is required to view this content. Sign in or start your free trial.
Method Article
* These authors contributed equally
A method of constructing a phylogenetic tree based on sequence homology of SWEETs from eukaryotes and SemiSWEETs from prokaryotes is described. Phylogenetic analysis is a useful tool for explaining the evolutionary relatedness between homologous proteins or genes from different organism groups.
Phylogenetic analysis uses nucleotide or amino acid sequences or other parameters, such as domain sequences and three-dimensional structure, to construct a tree to show the evolutionary relationship among different taxa (classification units) at the molecular level. Phylogenetic analysis can also be used to investigate domain relationships within an individual taxon, particularly for organisms that have undergone substantial change in morphology and physiology, but for which researchers lack fossil evidence due to the organisms' long evolutionary history or scarcity of fossilization.
In this text, a detailed protocol is described for using the phylogenetic method, including amino acid sequence alignment using Clustal Omega, and subsequent phylogenetic tree construction using both Maximum Likelihood (ML) of Molecular Evolutionary Genetics Analysis (MEGA) and Bayesian Inference via MrBayes. To investigate the origin of eukaryotic Sugars Will Eventually be Exported Transporters (SWEET) genes, 228 SWEETs including 35 SWEET proteins from unicellular eukaryotes and 57 SemiSWEET proteins from prokaryotes were analyzed. Interestingly, SemiSWEETs were found in prokaryotes, but SWEETs were found in eukaryotes. Two phylogenetic trees constructed using theoretically distinct methods have consistently suggested that the first eukaryotic SWEET gene might stem from the fusion of a bacterial SemiSWEET gene and an archaeal SemiSWEET gene. It is worth noting that one should be cautious to draw a conclusion based only on phylogenetic analysis, although it is useful to explain the underlying relationship between different taxa, which is difficult or even impossible to discern through experimental means.
DNA or RNA sequences carry genetic information for underlying phenotypes that can be analyzed through physiological and biochemical methods or observed through morphological and fossil evidence. In a sense, genetic information is more reliable than evaluating external phenotypes because the former is the basis for the latter. In evolutionary study, fossil evidence is very direct and convincing. However, many organisms, such as microorganisms, have little chance to form a fossil during long geologic ages. Therefore, molecular information such as nucleotide sequences and amino acid sequences from related extant organisms are of value for exploring evolutionary relationships1. In the present study, a simple introduction of basic phylogenetic knowledge and an easy-to-learn protocol was provided for newcomers who need to construct a phylogenetic tree on their own.
Both DNA (nucleotide) and protein (amino acid) sequences can be used to infer phylogenetic relationships between homologous genes, organelles, or even organisms2. DNA sequences are more likely to be affected by changes during evolution. In contrast, amino acid sequences are much more stable given that synonymous mutations in nucleotide sequences do not cause mutations in amino acid sequences. As a result, DNA sequences are useful for comparison of homologous genes from closely related organisms, whereas amino acid sequences are appropriate for homologous genes from distantly related organisms3.
A phylogenetic analysis begins with the alignment of amino acid or nucleotide sequences4 retrieved from an annotated genome sequencing database5 listed in FASTA format, i.e., putative or expressed protein sequences, RNA sequences, or DNA sequences. It is worth noting that it is critical to collect high-quality sequences for the analysis, and only homologous sequences can be used to analyze phylogenetic relationships. Many different platforms such as Clustal W, Clustal X, Muscle, T-coffee, MAFFT, can be used for sequence alignment. The most widely used is Clustal Omega6,7 (http://www.ebi.ac.uk/Tools/msa/clustalo/), which can be used online or can be downloaded free of charge. The alignment tool has many parameters that the user can adjust before starting the alignment, but the default parameters work well in most cases. After the process is complete, the aligned sequences should be saved in the correct format for the next step. They should then be edited or trimmed using an editing software, such as BioEdit, because phylogenetic tree construction by MEGA requires the sequences to be of equal length (including both amino acid abbreviations and hyphens. In the aligned sequence, any position without an amino acid or nucleotide is represented by a hyphen "-"). Generally, all of the protruding amino acids or nucleotides at either end of the alignment should be removed. In addition, columns containing poorly aligned sequences in the alignment can be deleted because they convey little valuable information, and can sometimes give confusing or false information3. The columns containing one or more hyphens can be deleted at this time or in the later tree construction stage. Alternatively, they can be used for phylogenetic computation. When the sequence alignment and trimming is finished, the aligned sequences should be saved in FASTA format, or the desired format, for later use.
Many software platforms provide tree construction functions using different methods or algorithms. In general, the methods can be classified as either distance matrix methods or discrete data methods. Distance matrix methods are simple and fast to compute, while discrete data methods are complicated and time-consuming. For very closely related taxa with a high degree of sharing of amino acid or nucleotide sequence identity, a distance matrix method (Neighbor Joining: NJ; Unweighted Pair Group Method with Arithmetic mean: UPGMA) is appropriate; for distantly related taxa, a discrete data method (Maximum Likelihood: ML; Maximum Parsimony: MP; Bayesian Inference) is optimal3,8. In this study, the ML methods in MEGA (6.0.6) and Bayesian Inference (MrBayes 3.2) were applied to construct phylogenetic trees9. Ideally, when the proper model and parameters are used, the results derived from different methods may be consistent, and they are thus more reliable and convincing.
For a ML phylogenetic tree constructed using MEGA10, the aligned sequence file in FASTA format must be uploaded into the program. The first step then is to select the optimal substitution model for the uploaded data. All available substitution models are compared based on the uploaded sequences, and their final scores will be shown in a results table. Select the model with the smallest Bayesian Information Criterion (BIC) score (listed first in the table), set ML parameters according to the recommended model, and start the computation. The computation time varies from several minutes to several days, depending on the complexity of the loaded data (length of the sequences and number of taxa) and the performance of the computer on which the programs are run. When the computation is finished, a phylogenetic tree will be shown in a new window. Save the file as "FileName.mat". After setting parameters to specify the appearance of the tree, save once more. Using this method, MEGA can generate publication grade phylogenetic tree figures.
For tree construction with MrBayes11, the first step is to transform the aligned sequence, which is usually listed in FASTA format, into nexus format (.nex as the file type). Transforming FASTA files into nexus format can be processed in MEGA. Next, the aligned sequence in nexus format can be uploaded into MrBayes. When the file is successfully uploaded, specify detailed parameters for the tree computation. These parameters include details such as amino acid substitution model, variation rates, chain number for Markov chain Monte Carlo (MCMC) coupling, ngen number, average standard deviation of split frequencies, and so on. After these parameters have been specified, start the computation. In the end, two tree figures in ASC II code, one showing clade credibility and the other showing branch lengths, will be displayed on the screen.
The tree result will be saved automatically as "FileName.nex.con". This tree file can be opened and edited by FigTree, and the figure displayed in FigTree can be modified further to make it more suitable for publication.
In this study, 228 SWEET proteins, including 35 SWEETs from unicellular eukaryotes and 57 SemiSWEETs from prokaryotes, were analyzed as an example. Both the SWEETs and SemiSWEETs were characterized as glucose, fructose, or sucrose transporters across membranes12,13. Phylogenetic analysis suggests that the two MtN3/saliva domains containing SWEETs might be derived from an evolutionary fusion of a bacterial SemiSWEET and of an archaeon14.
1. Sequence Alignment
2. Computation of the Phylogenetic Tree
3. Presentation of the Phylogenetic Tree
NOTE: A phylogenetic ML tree will be presented when the computation using MEGA is finished (Figure 10).
4. Analysis of the Relationship of SWEETs and SemiSWEETs Using Sequence Alignment
NOTE: This step may not be needed in ordinary sequence analysis.
5. Phylogenetic Tree Construction with MrBayes
Phylogenetic trees show that all of the first MtN3/saliva domains of the 35 SWEET sequences clustered as one clade and the second MtN3/saliva domains of the SWEET sequences clustered as another clade. In addition, alignment results of the SWEETs and SemiSWEETs show that some SemiSWEETs from α-Proteobacteria aligned with the first MtN3/saliva domain of the SWEET sequences, whereas SemiSWEETs from Methanobacteria (archaea) aligned with the second MtN3/saliva domain of the SWEET sequenc...
It is becoming increasingly popular in biological research to make a phylogenetic tree based on nucleotide or amino acid sequences8. Generally, there are three critical stages of the practice including sequence alignment, evaluation of the aligned sequences with the proper method or algorithm, and visualization of the computational result as a phylogenetic tree. In the presented study, three rounds of sequence alignment were conducted: first, the SWEET protein sequences, including the first a...
The authors have nothing to disclose.
This work was supported by the National Natural Science Foundation of China (31371596), the Bio-technology Research Center, China Three Gorges University (2016KBC04), and the Natural Science Foundation of Jiangsu Province, China (BK20151424).
Name | Company | Catalog Number | Comments |
Adobe Illustration | a graphical tool developed by Adobe Systems Software Ireland Ltd. Copyright © 2017 | ||
BioEdit | a biological sequence alignment editor written for Windows 95/98/NT/2000/XP/7. Copyright © Tom Hall | ||
Clustal Omega | a package for making multiple sequence alignments of amino acid or nucleotide sequences. http://www.clustal.org/ | ||
CorelDRAW | a graphic design software. Copyright © 2017 Corel Corporation | ||
FigTree | a graphical viewer of phylogenetic trees designed by the University of Edinburgh | ||
MEGA | MolecularEvolutionary Genetics Analysis version6.0 http://www.megasoftware.net/home | ||
MrBayes | an Bayesian phylogenetic inference tool | ||
NVIDIA | a company designs graphics processing units (GPUs) for the gaming and professional markets. Corporation Copyright © 2017 | ||
PAUP | Phylogenetic Analysis Using Parsimony. David Swofford's program implements the maximum likelihood method under a number of nucleotide models. | ||
Photoshop | a raster graphics editor developed and published by Adobe Systems Software Ireland Ltd. Copyright © 2017 | ||
RHYTHM | a knowledge based prediction of hekix contacts. Charité Berlin – Protein Formatics Group - Copyright 2007-2009 | ||
TMHMM | a tool for prediction of transmembrane helices in proteins. http://www.cbs.dtu.dk/services/TMHMM/ | ||
Compter | 4 GB memory, Core 2 or above CPU. Windows 7, Windows 10 |
Request permission to reuse the text or figures of this JoVE article
Request PermissionThis article has been published
Video Coming Soon
Copyright © 2025 MyJoVE Corporation. All rights reserved