OpenProt is the first database that allow a polysystronic annotation of eukaryotic genomes, recognizes the coding potential of proto-genes and enables the discovery of previously undetectable proteins. We have designed this protocol so that it is accessible to all users without requiring extensive bioinformatic skills It essentially places proteomic discoveries within anyone's grasp. To comprehend and tackle today's challenges in medicine, we need to fully understand the proteomic landscape and all of the actors and their dynamics.
OpenProt provides that possibility. Although here, OpenProt is used for the analysis of patonomic experiments, it can also be used with other systems as it simply provides a more clear definition of the proteome. Begin by opening the OpenProt website and using the link from the top page menu to open the downloads page.
Click on the species of interest based on the analyses'experimental data and the desired protein type. Click AllProts, Isoforms and RefProds to generate files containing all of the known and novel protein types present in the OpenProt database and if available, click on the annotation from which the protein sequences are drawn. Click on the level of supporting evidence necessary for the protein consideration according to the research objective.
Then click, all predicted to generate files containing all of the OpenProt predictions and click the desired file format to download. For proteomic anayses, select the FASTA protein file. The readme file will contain all of the necessary information on the file format.
For database handling, log into an appropriate proteomics tool instance and create a new history. To import the downloaded OpenProt database, click Upload. To input the database handling workflow, go to the workflow page, click Upload again and click run the workflow.
Then select the imported OpenProt database as input and rename the obtained Fasta file to something meaningful. The database is ready to be used for proteomics analyses. For mass spectrometry file preparation, open the freely available MS convert tool from the proteo wizard suite and Upload the datafile to be analyzed.
Select the directory for the output and set the desired file format to mzML then use the wavelet based algorithm on mass spectrometry levels one and two to select a peak picking filter and start the conversion. For protein quantification, create a new history and drag and drop the previously created database into the new history. Click Upload to import the transformed mzML data file.
Open the workflow page, click Upload again to import the desired workflow, and select run the workflow to review the different parameters. Select the imported mzML data file and input the previously created database as the database Fasta file. Since the workflow uses the X!
Tandem search engine, click Upload to import the X! Tandem default configuration file. To account for the substantial increase in size when using the whole OpenProt databse use a stringent false discovery rate.
For quality control, run the file info tool on the ID filter output to provide common metrics of performance, such as the number of peptide spectrum matches or the number of identified peptides and proteins. For OpenProt database mining, return to the OpenProt website and open the search page. Click on the species of interest for which the protein was identified and enter the protein accession number in the protein query box.
Click surge and a table containing basic information for the queried protein will appear. Next, click the details link. The newly opened page will contain a genome browser that is centered on the queried protein, as well as other information.
To obtain protein or DNA sequences, click the protein or DNA links from the info tabs, respectively Then click the tabs to browse the detailed information about the mass spectrometry evidence ribosome profiling detection, and conservation and identified protein domains. In this representative anlysis, most of the proteins identified in the original paper were also identified using either the OpenProt 2_pep or the OpenProt_all database, showing that OpenProt databases are able to produce protein identification and quantification comparable to that of current procedures based on the InterPro KB databases. 11 well supported proteins not yet currently annotated in databases were identified across all of the datasets with confident peptides using the OpenProt 2_pep database.
29 novel proteins were discovered across all of the datasets with confident peptides using the OpenProt_all database. The recommended stringent false discovery rate did not affect the most confident protein identifications, although it did decrease the total number of identified proteins. One novel protein was discovered as an interactor of the Raf-1 protein.
This proteins had not been previously detected by mass spectrometry or ribosome profiling and demonstrated a good quality spectrum. Remember to ensure the selected parameters are adequate to the experimental design and always verify the quality of sporting evidence when reporting novel protein discoveries. This protocol is adaptable to all top-down proteomics experiments notably when used with functional proteomics.
This will allow much deep screening and understanding of protein interactions and cellular pathways. OpenProt highlights the substantial underestimation of the protomic landscape relayed by current genomic notations and emphasizes the polysystronic nature of eukaryotic genes. It is a whole new avenue for research.
The beauty of this protocol is that it doesn't require extensive bioinformatic skills and that since it uses distant servers, it can be run on any computer.