The overall goal of this procedure is to identify co-evolving residues in protein alignments that imply interpositional dependencies or IPDs. This is accomplished by first loading the alignment into stick world, which is a visual analytics tool that creates an interactive 3D representation of a protein alignment and clearly displays co varying residues in stick world. Each position in the alignment is represented as a column comprised of a stack of spheres, one sphere for each of the possible 20 amino acids in that position within the alignment that are sized in a frequency dependent manner, the columns representing each position are wrapped around a cylinder to represent IPDs.
Lines are drawn between residues, which are co-evolving higher or lower than would be expected if the residues present in the positions were dependent within the program. The residual is tuned until there are a manageable number of edges and then the edges of interest are identified. Ultimately, stick world is used to identify residues which are co-evolving with one another.
Such functionally required coving residues have been identified in a ventilate kinase. One of the reasons people struggle with this process is because of the novelty. It's almost as much of an art form as it is a science.
Demonstrating stick world visually is important because it's a visual analytics tool and requires user interaction.Action. Use a computer that has an Intel I five or better processor with at least four gigabytes of Bram is running Mac OS 10 or Linux OS, and is equipped with the Python libraries listed in the text protocol. Download Stick World as a zip archive containing all of the relevant Python scripts.
Also, download the FASTA two stick script for converting standard fasta DNA protein sequence alignments to the stick world format. Extract the archive and put the resulting stick world folder and the FASTA two stick script on the desktop. Then create an alignment of the protein sequences using any standard alignment software.
Save the alignment on the desktop. Infest a format. Open the terminal application on the macro Linux computer and navigate to the desktop by typing CD tilda slash desktop and pressing return in the terminal.
Type the command to make the FASTA two stick script executable and then type the command to execute the script. Follow the onscreen instructions provided by the script to specify the input file name and the desired output name. Save the output file on the desktop top.
Navigate into the stick world executables folder using the terminal application of the Mac or Linux computer launch stick world by typing Python dash 32 stick world demo PI in the terminal. Verify that the stick world data loader panel is visible on the screen. Then load the converted protein sequence alignment by pressing the load protein button.
Select the file created and press open stick world will open several new windows, including stick world control and stick world open gl. Select the stick world open GL window. Choose reset view from the open GL menu to display the default stick world visualization in a top down view through the cylinder, representing the data in the resizable open GL windows.
Several view options exist in stick world. Select the boxes for column labels and ball labels in the stick world control pane to display values for columns and balls. Deselect the box for column edges in the stick world control pane to hide the column edge lines.
Set the column thickness to 0.1 in the stick world control pane to draw a thin line through the columns. Making it easier to navigate the 3D view press return to accept the change. Reset the view in the stick world open GL window.
Then press the full screen button to maximize the view to navigate within the program. Rotate the 3D stick world display by holding down the left mouse button while moving the mouse in any direction. Zoom the 3D stick world display by holding down the right mouse button while moving the mouse up or down.
Browse the view by panning and zooming co-evolving residues exceeding the threshold requirements of both p and residual are connected via edge lines. If there are too many or too few edges connecting residues, change the residual threshold to show fewer or more edges Increase the residual threshold on the stick world control pain until no IPD edge lines are shown and slowly ramped down until relationships appear. Continue increasing the residual until there are a sufficient number of relationships to examine.
Identify relationships that involve either residues of known interest or residues that are distal to one another within the alignment using command plus left click, select any edges of interest. The stick world control pane will indicate the columns and connect specific residues. Solid lines represent positive associations.
While dashed lines represent negative associations. Press the output edges button on the stick world control pane to save a plain text formatted file of all of the visible edges in the proper directory, including the joint residues and their actual residual values. A large cluster interpositional dependencies or IPDs, including a three node association between glycine at position 1 32, tyrosine position 1 35 and a proline at position 1 41 is visible in the foreground.
Here, the view has been skewed to position the user slightly above the cylinder, revealing an IPD between a histamine at position 1 36 and a methionine at position twenty nine one hundred and seven residues distant. Conversely, A-P-A-M-H-M-M derived motif of the same domain does not detect these as specifically co-occurring motif variance and also defines the overall groupings in a biologically unsupported scheme. When performing this procedure, it's important to remember to try it two different ways.
Start with a high residual and work your way down, or start with a low residual and work your way up, and that way you can explore the space in two different ways.