68.4K Views
•
16:41 min
•
November 3rd, 2011
DOI :
November 3rd, 2011
•The aim of this procedure is to computationally predict three dimensional structures and biological function of protein molecules starting from their amino acid sequences. This is accomplished by first predicting the secondary structure of the proteins by machine learning. The sequences and the predicted secondary structure are then matched with the solved structures in the PDB library to identify the best possible structure templates.
This procedure is called threading. Following the threading procedure, the IT AER program will split the templates into fragments based on the sequence template alignments, and then reassemble the fragments to full length models in the third step. Full atomic models are constructed by atomic level refinements to optimize hydrogen bonding networks and remove ster over overlaps.
The last step of the procedure is to identify the biological function of the proteins by matching the prediction structures with proteins of known function in the function library. The main advantage of ITER over existing structure modeling method is the inherent structure fragment assembly approach, which can consistently drive the threading alignments closer to the native state. These high quality structure models also form the basis of accurate structure based functional annotations to promote the usage of ITER in the scientific community.
Our lab has made available a website where the protein sequences can be submitted to iter. This website acts as a nexus with which the users worldwide may register an interface to computer cluster, which manages and runs ITER simulations. An ITER simulation job consists of over a dozen smaller sub simulations.
These simulations when run on a single computer with a single processor core can take over a hundred hours. The Zang lab computer cluster takes and distributes these sub simulations across hundreds of computers and is capable of running over 2000 simulations. Concurrently with our computer cluster, we are capable of completing hundreds of I taster simulations every day.
Even with this capacity, much work has to be done to optimize the system and minimize waiting time for our online IT AER users. To start with the structure and function modeling experiment, log onto the IT AER webpage. The URL addresses for all relevant webpages discussed here can be found in the written protocol.
Copy and paste the amino acid sequence into the provided form, or directly upload the sequence by clicking the browse button. Provide an email address and a name for the job. Users can optionally specify external into residue contact or distance restraints.
Add in an additional template or exclude some template proteins during the structure modeling process. To submit the sequence, click on the run it taser button. Check the status of the submitted job by visiting the IT taser queue page.
Click on the search tab and use the job ID number or the query sequence to search for the submitted job. After the structure and function modeling is finished, a notification email containing an image of the predicted structures and a web link will be sent to the provided email address. Click on this link to view and download the results.
Begin structure analysis by examining the secondary structure prediction, which is displayed as H for alpha helix, S for beta strand, or C for coil. Also, consider the confidence score of prediction for each residue. Look for regions with long structures of regular secondary structure predictions to estimate the core region in the protein.
The structural class of the protein can also be analyzed based on the distribution of secondary structure elements. View the predicted solvent accessibility to ascertain buried and solvent exposed regions. In the query values of predicted solvent accessibility range from a score of zero for a buried residue to a score of nine for an exposed residue.
Regions containing mostly buried residues can be used to delineate the core region in the protein while regions with solvent exposed and hydrophilic residues are potential hydration or functional sites. To view the predicted tertiary structures of the query protein, scroll down to the displayed interactive JMO Appt left. Click on the applet to change the appearance of the displayed structure.
Zoom into a specific region, select specific residue types in the predicted model or calculate into residue distances. Analyze the confidence scores of the structural modeling to estimate the quality of the predicted structures. Csco values are typically in the range of negative five to two, wherein a higher score reflects a model of better quality.
The estimated TM score and RMSD of the first model is shown as estimated accuracy of model one. Click on the more about csco link. To analyze the csco cluster size and cluster density of all the models, analyze the top 10 threading templates of the query protein as identified by low mets threading programs.
By scrolling down the results page, view the normalized Z-score to analyze the quality of threading alignments. Alignments with a normalized csco greater than one, reflect a confident alignment and are most likely to have the same fold as the query protein. Examine the sequence identity in the threading aligned region and for the whole chain to assess the homology between the query and the template proteins.
High sequence identity is an indicator of evolutionary relatedness between the query and template proteins. View the threading aligned residue shown in color to visually identify conserved residues or motifs in the query and the template proteins a higher sequence identity in the threading aligned region as compared to whole chain alignment also indicates the presence of conserved structural motifs or domains in the query. Assess the coverage of the threading alignment by inspecting the alignment.
If the coverage of the top alignment is low and confined to only a small region of the query protein or absent for a long segment of query sequence, it indicates that the query protein contains more than one domain. In this case, it is recommended to split the sequence and model the domains individually. View the next table of the result page to determine the top 10 structural analogs of the first predicted model as identified by the structural alignment program, TM align.
A TM score greater than 0.5 indicates that the detected analog and model have a similar topology and can be used to determine the structural class or protein family of the query protein. Those with a TM score less than 0.3 signify a random structure similarity. Analyze the sequence identity and RMSD in the structurally aligned region to assess the conservation of spatial motifs in the model and the structural analog.
Visually inspect the colored and aligned residue pairs in the alignment to identify these structurally conserved residues and motifs. Look at the predicted EC numbers table to view the top five potential enzyme OGs of the query protein. The confidence level of the EC number prediction using these templates is shown as the EC score based on benchmarking analysis.
Functional similarity between the query and template protein can be reliably interpreted using an EC score greater than 1.1. Next, look for consensus of function amongst the templates, which have a similar fold as the query protein. If multiple templates have the same EC number and the EC score is greater than 1.1, the confidence level of prediction is very high.
However, if the EC score is high, but there is a lack of consensus among the identified hits, the prediction becomes less reliable and the users are recommended to consult the gene ontology. Term predictions view the predicted gene ontology terms table to identify the top 10 homologs of query protein in the PDB library annotated with gene ontology terms, each protein is usually associated with multiple gene ontology terms describing its molecular functions, biological processes, and cellular location. Click on each term to visit the amigo website and analyze its definition and lineage.
Analyze the functional homology score column to access the functional similarity between the query and template proteins. The confidence level of transferring functional annotation from these proteins can also be estimated. View the consensus prediction of gene ontology terms table to analyze the concurrence of function between the templates.
These common functions are used for predicting the gene ontology terms of the query protein and to assess the confidence level of geo term predictions. Finally, scroll down to the bottom of the page to view the top 10 ligand binding site predictions for the query protein predicted binding sites are ranked based on the number of predicted ligand confirmations that share a common binding pocket. The best identified binding site is already displayed in the JM OL appt.
Click on the radio buttons to analyze other predictions and visualize the ligand interacting residues. The BS score reveals local similarity between the model and the templates binding site. A BS score greater than 1.1 indicates high sequence and structural similarity near the predicted binding site.
In the model as compared to the known binding site in the template, the IT is a main webpage contains links for other useful features. The forum feature allows the user to create an online account and seek help from other ITER users regarding structure modeling or for help interpreting the results. The download feature allows users to download iter and related packages and install them on their computer.
This helps reduce the time required to perform the modeling experiments. The queue feature allows the status of all submitted jobs to be seen on the IT a Q page. Users can also visually inspect the image of modeled structures for finished jobs.
On this page, also shown at the CSCO expected TM score and expected RMSD of the first model and the submission date shown here is an excerpt of the IT AER results page showing the faster formatted query sequence, the predicted secondary structure, and the associated confidence scores and predicted solvent accessibility of the residues. The analyzed core region and the potential hydration site in the query are highlighted in cyan and red rectangles respectively. Here the tertiary structure predictions for the query proteins are shown.
The predicted models are displayed in an interactive JML app outlet, allowing the user to change the display of the molecule. The models can also be downloaded by clicking on the download links, the confidence score to estimate the quality of the model is reported as the csco. An example of the itta A results page showing the top 10 identified threading templates and alignments by Loomis threading programs is presented.
The quality of the threading alignments is evaluated based on the normalized Z-score, where a value greater than one reflects a confident alignment. Aligned residues in the template that are identical to the corresponding query residues are highlighted in color to indicate the presence of a conserved residue or motif. Conversely, a lack of alignment in most of the top templates indicates the presence of multiple domains in the query protein and the unaligned residues correspond to domain linker regions.
This table displays the top 10 identified structural analogs and structural alignments identified by the TM aligned Structural alignment Program. The ranking of the analogs is based on the TM score of the structural alignment. A TM score greater than 0.5 indicates that the two compared structures have a similar topology.
While a TM score less than 0.3 means a similarity between two random structures. Structurally aligned residue pairs are highlighted in color based on their amino acid property while the unaligned regions are indicated by a dash. Here is an example of the ITR result page showing identified enzyme homologs of the query protein in the PDB library.
The confidence level of EC number prediction is analyzed based on the EC score, where an EC score greater than 1.1 indicates functional similarity between query and template protein. The gene ontology term prediction table for the query protein includes functional homologs for the query protein in the gene ontology template library ranked based on their functional homology score. Common functional features from these top scoring hits are derived to generate the final gene ontology term predictions for the query protein.
The quality of the predicted gene ontology terms is estimated based on geo score, where a geo score greater than 0.5 indicates a reliable prediction shown here as an example of the IT AZA result page showing the top 10 protein ligand binding site predictions using the cofactor algorithm. The ranking of the predicted binding sites is based on the number of predicted ligand confirmations that share a common binding pocket. In the query BS score is a measure of the local sequence and structure similarity between the predicted and templates binding site and is useful for analyzing the conservation of binding site pockets.
Although ISER is one of the most efficient algorithms for protein structure and function prediction, it is important to remember that it's just a prediction from computer algorithms. Any experimental data or function insights, for example, residue contacts binding information will be extremely useful for increasing the accuracy of predictions. The IT AER server has a portal to include these informations during the modeling procedure To accommodate increasing interest in it.
Aer, the Zang lab has released the IT AER software free for non-commercial research. We are actively developing an improving IT AER and improving eye taster, and in the hope that its availability will lead to large scale application outside of the Zang lab and will benefit and spur further research in the scientific community.
I - TASSERのパイプラインを使用してタンパク質の構造と機能解析に基づくコンピューターのためのガイドラインが記述されています。クエリタンパク質配列から始まる、3Dモデルは、複数のスレッドアラインメントを使用して生成され、構造組立シミュレーションを反復している。機能的な推論は、その後、既知の構造と機能を持つ蛋白質の一致に基づいて描画されます。
0:05
Title
3:37
Structure Analysis
2:21
Running the I-TASSER Server
7:30
Structural Analogs in PDB and Enzyme Commission Number Prediction
9:20
Gene Ontology (GO) Term and Protein-ligand Bind site Predictions
5:58
LOMETS Target Template Alignment
12:05
Representative I-TASSER Results
15:43
Conclusion
関連動画
20.5K Views
16.4K Views
10.7K Views
63.2K Views
20.4K Views
21.0K Views
31.3K Views
33.8K Views
36.9K Views
12.1K Views
Copyright © 2023 MyJoVE Corporation. All rights reserved