The goal of this protocol is to reveal structural dynamics of one-dimensional diffusion of protein along DNA, using a plant transcription factor WRKY domain protein as an exemplary system. The atomic simulations under the Markov state model construction reveal 1-bp stepping motions of protein along DNA at atomic details. While the coarse-grained simulations focus on sampling of protein processive diffusions over 10 of bps along DNA.
To begin, use a 10 microsecond all-atom MD trajectory to extract 10, 000 frames evenly forward, one base pair stepping path. Prepare the transition path with 10, 000 frames in VMD by clicking File and Save Coordinates. Then, type protein or nucleic in the Selected Atoms box.
Choose Frames in the frames box, and click Save to get the frames needed. Align the long axis of the reference from the crystal structure of DNA to the x-axis, and set the initial center of mass of the full 34 base pairs DNA at the origin of the coordinate space by clicking Extensions and then selecting TkConsole in VMD. Afterward, type the command in the TkConsole command window.
Then, calculate the root means square distance of the protein backbone by clicking VMD, then go to Extensions, click Analysis, and select the RMSD Trajectory Tool. In the atom selection box, type nucleic and residue 14 to 23, and 46 to 55. Click ALIGN and then RMSD box.
To calculate the rotational degree of protein around DNA theta T, with the initial angular positioning defined as theta 0, on the XY plane in MATLAB, execute the command. Enter the instructions in MATLAB to use K-means methods, and classify the 10, 000 structures into 25 clusters. Once done, gather the structures of the 25 cluster centers for further MD simulation.
To conduct the first round MD simulation, build an atomistic system for the 25 structures by using GROMACS, and the build system sh file. Conduct 60 nanoseconds MD simulations for the 25 systems under NPT ensemble with a time step of two femtosecond by processing the the command in shell. To cluster the first round MD trajectories, remove the first 10 nanoseconds of each simulation trajectory and collect confirmations from the 25 times 50 nanosecond trajectories.
For the time independent component analysis, enter the script in GROMACS, followed by choosing distance pairs between protein and DNA as input parameters projection. From the index. ndx file, to a new text file index.dat.
To get the pair information between these atoms, use the Python script. Calculate the 415 distance pairs from every trajectory in MSMBuilder command window. Next, conduct a time independent component analysis to reduce the dimension of data onto the first two time independent components or vectors by executing the command.
With the processing of the instruction in MSMBuilder, cluster the projected data sets into 100 clusters using the case center method and select the center structure of each cluster. To conduct second round MD simulation, conduct 60 nanosecond MD simulations starting from the 100 initial structures. After imposing random initial velocities on all the atoms, add the random initial velocities by turning on the Velocity generation in MDP file.
Remove the first 10 nanoseconds of each simulation as described previously. And collect 2, 500, 000 snapshots from the 100 times 50 nanosecond trajectories, evenly, to construct the MSM. To cluster second round MD trajectories, conduct the time independent component analysis for the second round trajectories in the MSMBuilder as shown.
And calculate the implied time scale to validate parameters by executing the Python script. Then, vary the lag time tau and micro-states number by changing the parameters. Classify the confirmations into 500 clusters by executing the command.
For MSM construction, lump the 500 micro-states into three to six macro-states. To find out the number of macro-states that suit best, according to the PCCAplus algorithm in MSMBuilder by using the Python script. Map the high dimensional confirmations to the X and rotational angle of the protein along the DNA for each micro-state.
To calculate the mean first passage times, conduct five 10 millisecond Monte Carlo trajectories, based on the transition probability matrix of the 500 micro-state MSM with the lag time of 10 nanoseconds set as the time step of Monte Carlo. Calculate mean first passage times between each pair of macro-states within the Python script and the average of standard error of the mean first passage times using the Bash file. In the CafeMol 3.0 software, run the course-grained simulation by executing the command on the terminal.
After specifying the blocks in the Input file, set the filenames block and the job_cntl block with the individual commands. Next, set the unit_and_state block, followed by setting the energy_function block and the md_information block. All protein confirmations on the DNA were mapped to the longitudinal movement X and rotation angle of the protein along DNA, which can be further clustered into three macro-states.
The S1 state is less favorable as the hydrogen bonds are similar to the modeled structure, whereas S3 refers to a metastable state where all the hydrogen bonds shifted after one base-pair stepping and appeared stable with the highest population of 63%The intermediate state S2 connects S1 and S3 with a medium high population of 30%The transition of S2 to S3 allows collective breaking and reforming of the hydrogen bonds in approximately seven microseconds, while S1 to S2 transition can occur in approximately 0.06 microseconds. The contact numbers between protein and DNA were calculated and four states were identified. In states 1 and 3, the zinc finger region binds toward the Y direction.
Whereas in states 2 and 3, the zinc finger region binds toward the Y direction. The stepping size for each conserved residue on different sequences of DNA was measured, which revealed that the stepping sizes of these residues are more synchronized on polyA DNA than on polyAT or random DNA sequences. The important steps in Markov state model construction are choosing distance pairs between protein and DNA data repairs into the 1-bp stepping motions, and the selection of a suitable number of micro-states and macro-states.