To begin, prepare the protein peptide interface for sequence diversification. Open the PDB file in chimera and ensure the structure of the target subunits is intact with no missing atoms or bonds. For removing all non-essential molecules from the structure, click on Select, then Residue, and then select all molecules other than standard amino acids.
Then click on Actions followed by Atoms/Bonds and delete. Then, click on Favorites and Sequence, and then click on the chain considered as the ligand. Crop the ligand chain to the identified interacting segment by deleting all the residues except those between the selected positions.
Click on File and save PDB to save the edited structure to a different PDB file. Copy this file to a Linux location accessible by the Rosetta applications. Use Rosetta's fixed PDB application to perform a repack of all the amino acid side chains of the base structure before sequence diversification by running this command.
Then, rename the repacked PDB file with the underscore repack suffix using the following command. Next, run pepspec in design mode to perform sequence diversification using this command. Then, generate a pwm using the gen_pepspec_pwm.
py script included in the Rosetta Suite. To run this script, use the following command. To create a sequence logo, open the file with the peptide sequences generated in the previous step with the preferred text editor and copy all the sequences.
Navigate to the web logo server and paste the sequences in the multiple sequence alignment text box. Choose a desired format and size of the logo according to the input length and click on create logo. Using this protocol, the amino acid preferences were predicted for the conserved pLxIS motif in IRF5 binding surface.
Position, weight matrix, and sequence logo generated upon sequence diversification showed a preference for glutamate at position 432 and for leucine and isoleucine at positions 433 and 435. Positions 427, 429, and 436 typically occupied by serine showed a higher preference for aspartate and glutamate, highlighting the role of phosphorylation and IRF5 dimerization. Position 425 showed a high preference for serine, suggesting its involvement in protein-protein interaction in its unphosphorylated form.