This method can help answer key questions in the phytogenetic field such as how to collect sequences and how to select the correct computation model. The main advantage of this technique is that it's very easy to learn. Generally, those who are new to this method may struggle simply because they don't know how to handle the data due to a lack of basic knowledge of the principles and methods of phylogeny.
This method was first developed when we analyzed the alignment of eukaryotic SWEET genes. Visual demonstration of this method is critical as there is a magnitude of choices and settings for different aspects of the alignment. Begin this procedure with selection of 35 candidate SWEET proteins from unicellular eukaryotic organisms as described in the text protocol.
Align the 35 SWEET sequences by inputting them into Clustal Omega. Copy and paste the protein sequences in FASTA format into the input box or upload a sequence file in FASTA format. Specify that it is an amino acid sequence by clicking the icon under the pull down menu in the step one section.
Specify the output format and other parameters in the step two section if necessary. For this study, set output format as Clustal without number and leave the other parameters on default settings. In most cases, the default parameters work well without any specification.
Submit and run the alignment in the step three section. It may take anywhere from several seconds to minutes until the alignment is finished. In the Result Summary panel, right click the link under the Alignment in CLUSTAL format and save the aligned sequences as 35.clustal.
To open the alignment result file in BioEdit, click Sequence and select Edit Mood in the first pull down menu on the main panel of BioEdit. Then click Edit Residues in the submenu. Select the protruding sequences on the left side of the alignment with the cursor and click the Delete icon under the Edit menu to remove the selected sequences.
Select and delete the sequences on the right side of the first MtN3 saliva domain and save the trimmed first MtN3 saliva domain sequences as 35-1.fas. Likewise, delete the left and right side sequences of the second MtN3 saliva domain and save it as 35-2.fas. The first and the second MtN3 saliva domain sequences can be predicted with rhythm or TMHMM in advance.
Open the file 35-1. fas with MEGA and click Align when prompted. Under the Edit menu, click Select All.
Then click Select Sequences. The names and sequences of the taxa will be selected in black. Choose Copy from the Edit menu to copy the sequences onto the clipboard and then paste the copied sequences into a doc file.
In the doc file, replace all number signs with the greater than sign and then delete any unrelated characters to convert them to FASTA format. Add 1 at the end of each taxon name to mark them as the first MtN3 saliva domain sequences. Process the second MtN3 saliva domain sequences following the same method and add 2 after each taxon name.
Now combine the first and second MtN3 saliva domain sequences in FASTA format in a doc file. To do so, load the combined sequences into Clustal Omega again and align the sequences as before. Save the result as 35realigned.clustal.
Open the 35realigned. clustal file in BioEdit. Delete the uneven amino acid residues at either end of the aligned sequences and then save the sequences as 35realigned.fas.
Click Yes when warned that some nonstandard characters cannot be saved. Open 35realigned. fas in MEGA.
Click the Data menu and choose Export Alignment. Then save the alignment in POP format as 35. next for later use in MR Base.
Meanwhile, click the Models icon on the main panel of MEGA. Choose Find Best DNA Protein Models and click OK on the popup window. Click Compute to begin the model searching process.
A new progress panel will open. This process lasts several minutes to several days depending on the complexity of the loaded sequences and the computer's performance. A table showing the results will open after the model searching process is finished.
The smallest BIC score will be listed first followed by a series of different models with gradually increasing BIC scores. The first model LG+G+F with the smallest BIC score is the recommended model for the ML tree based on the 35realigned. fas file.
Next, click the Phylogeny icon on the main panel of MEGA. Click Construct/Test the Maximum Likelihood Tree and then click Yes on the popup panel. A new window will open showing different parameters that need to be specified.
First, set the bootstrap value in the test of the phylogeny box. 500 or 1, 000 is adequate in most cases. Under the Substitution Model, choose amino acid as the substitution type.
The purpose of choosing a substitution model is to estimate the true difference between sequences based on their present states. Select LG with Freqs Model in the Model and Method box. In the Rates and Patterns box, select gamma distributed to describe rate variations across sites such as giving more weight to changes at slowly evolving sites.
In the Data Subset box, select complete deletion to remove all of the columns containing hyphens. Keep all other parameters in their default states. After specification of these parameters, click the Compute icon to start the calculation.
A phylogenetic ML tree will be presented when the computation using MEGA is finished. Under the pull down menu of the File icon on the Tree panel, choose Save Current Session to save the result. In the present study, the result was saved as 35.mas.
On the Tree panel, many parameters including length of clade, tree style, tree topology, font of the taxon name, size and color, are displayed and can be set to different options. Save the final tree file by clicking the Image icon and save the figure in different formats or copy the image as the source for photo editing. Proceed with the analysis of the relationship of SWEETs and semiSWEETs as well as using the sequence alignment phylogenetic tree construction with MR Base as detailed in the text protocol.
Phylogenetic trees show that all of the first MtN3 saliva domains of the 35 SWEET sequences clustered as one clade and the second MtN3 saliva domains of the SWEET sequences clustered as another clade. In addition, alignment results of the SWEETs and semiSWEETs show that some semiSWEETs from Alpha-Proteobacteria aligned with the first MtN3 saliva domain of the SWEET sequences whereas semiSWEETs from Methanobacteria aligned with the second MtN3 saliva domain of the SWEET sequences. These results together suggest that the two MtN3 saliva domains containing SWEETs might be derived from an evolutionary fusion of a bacterial semiSWEET and an archaean.
Once mastered, this technique can be done in one hour to several days depending on the complexity of the data. While attempting this protocol, it is important to collect high-quality sequences in advance. After its development, this protocol paved the way for researchers in evolution to explore the relationships between homologous genes.
After watching this video, you should have a good understanding of how to construct a phylogenetic tree.