Bioformatics is about using computers to solve questions in biology. Glycoinformatics is about using computers to solve questions in glyco biology. With glycoinformatics, we develop databases that store glycomics or glycoproteomics data that can be browsed or searched, and we also develop tools to visualize and compare this data.
The role of glycans is increasingly recognized as being important in health and disease, and glycoinformatics is trying to bring that forward as well. Catherine Hayes is trained in glyco biology and works as a data scientist. Julien Mariethoz is trained in computer science and coordinates the developments of databases and tools.
Go to the website glycoproteome.expasy. org/glycomics-expasy and in the leftmost menu, check the glycoproteins box. The bubble chart on the right will zoom in in the bubble matching that category, then click on the GlyConnect bubble to open the GlyConnect homepage in a new tab.
Select the protein button and in the protein view page, type prostate in the search window. Click on 790 corresponding to the common isoform of prostate-specific antigen or PSA. Next, on the top multicolored bar, click on the source button in green to display the sample types from which the published data were processed.
Click on the disease button to check the health-related content of the database. Then click on the structure button to view the full list of 135 structures associated with PSA from glycomics data. Click on the composition button for the associated 78 compositions determined by glycoproteomics experiments.
Click on any structure or composition to obtain further details. To reduce the ambiguity of compositions, click on suggested struct below a selected composition. A suggestion is made each time the monosaccharide count coincides with that of a listed structure.
To fully explore the protein page, view further details on the right side of the page. Go to the Octopus homepage to confirm the presence of common structural traits in the diversity of glycans attached to PSA, keep the N-Linked tab selected by default, move to the cores subtab and click on the hybrid icon. Then move to the properties subtab, select the sialylated icon and click on the green search button.
In the displayed graph of relationships, hover over H6N4F1S1 to highlight links to seven proteins in three structures. Contrast this by hovering over H6N4F2S1 that singles out the two isoforms of PSA. Hover over the structure ID to show its SNFG representation and click on it to open the corresponding page.
Change the center nodes to tissues and then place the cursor on urine or seminal fluid in the middle of the graph to view different associations. Change the center nodes to disease to display 13 options, one of which is prostate cancer. The only protein associated is PSA.
Next, click on the clear button to refresh the search. Move to the properties subtab and click on the bi-antennary icon. Then move to the determinants subtab, select the 3-sialyl-LN type two icon and click on the green search button.
Check the Octopus-retrieved associations with bi-antennary glycans containing a terminal 3-sialyl-LN type two motif. Change the center nodes to tissues for easier reading and hover over KLK3_human to directly connect seminal fluid with PSA common isoform and seven structures. Go back to the protein page, in this case PSA, to perform the scan of potential relationships between each composition in a list thereof.
On the right side of the PSA entry page, click on the Compozitor link. Ensure that the Compozitor search fields are pre-filled with the details of the ID 790 entry in the protein tab. Click on the add to selection button to retrieve data from the database.
Deselect the include virtual nodes option and then click on the compute graph button to display a graph showing a well-connected set of 78 compositions representing the PSA N-glycome and a bar plot showing the main characteristics of the glycans. Remain in the main protein tab and select prostate-specific antigen high Pi isoform in the protein field. Click on the add to selection button to retrieve data from the database that amounts to 57 compositions.
Click on the compute graph button to generate the superimposed graphs of both isoforms and assess the differences in glycomes of the two PSA isoforms. Go to the website www.unilectin. eu and click on the UniLectin3D button.
Click on the glycan search button, then click on the purple diamond representing a sialic acid which prompts the display of all glycan-binding motifs ending with a sialic acid stored in the database. Click on the 3-sialyl-LN type two motif to prompt the display of all lectins for which a 3D structure confirming the interaction with 3-sialyl-LN type two is known. The search by field option.
In the species field, type Homo sapiens. Click on the explore x-ray structures button to filter out the original list. Only one entry remains, that is the human galectin-8.
Click on the view the 3D structure and information button to display detailed information of human galectin-8 interacting with 3-sialyl-LN type two. Access the structural information on human galectin-8 displayed on the page with two different viewers. Hold the mouse to turn the molecule around and bring the ligand to the fore with the LiteMol software.
Mouse over the listed interactions on the left to update the view on the right and locate where that particular interaction acts in the structure with the PLIP software. Browse the HGI dataset from the GlyConnect homepage by going directly to the referenced article on this page. Click on the Compozitor link on the right side of the reference entry page to assess the consistency of the dataset.
The search field will be already filled with reference equals to the DOI number in the advanced tab of the tool. Type glycan_type=O-linked after the DOI number to narrow down the search to O-linked glycans. Then click on the add to selection button to retrieve the data from the database.
Keep the include virtual nodes option selected and click on the compute graph button to display the graph of connected compositions. Go to the protein tab of GlyConnect Compozitor and from the protein list, select inter-alpha trypsin inhibitor heavy chain H4.Ensure that the species selection is Homo sapiens by default. Deselect N-Linked in the glycan type.
Select only THR 725 in the site list and click on the add to selection button. Then click on the compute graph button to display the graph of connected compositions. To make sense of the virtual nodes, click on the export button below the graph.
Only select virtual and click on the clipboard icon to copy the selection of eight compositions. Paste the selection in the query window of Compozitor's custom tab. Set the selection label in the compositions field, select O-Linked in the glycan type field and click on the add to selection button.
Finally, click on the compute graph button. The tissue-dependent associations between proteins and glycans are shown in this output of GlyConnect Octopus. All human proteins carrying hybrid and sialylated glycan structures with the tissues in which they are expressed are displayed in this output.
The associations with urine are highlighted showing two proteins, choriogonadotropin or GLHA human and PSA common isoform or KLK3 human, connected to the scattered glycan structures. Similarly, the associations with seminal fluid are highlighted showing two protein isoforms of PSA connected to the grouped glycan structures. The superimposed N-glycomes of the two isoforms of PSA are shown in the output of GlyConnect Compozitor.
Blue nodes represent the glycans associated with the common isoform and those of the high Pi isoform are represented as red nodes. The overlap between glycomes is shown as magenta nodes. Numbers inside the nodes represent the number of glycan structures matching the labeled composition according to the content of the GlyConnect database regarding PSA.
The PSA glycome displayed in GlyConnect was shown to correlate to galectin-8 displayed in UniLectin3D via the 3-sialyl-LN type two terminal epitope. This provides a likely but not guaranteed scenario for protein-protein interactions mediated by glycans. A high-quality set of O-glycan compositions associated with human serum was examined and compared to the GlyConnect database content, thereby offering the option of customizing a glycan composition file for the refined identification of the glycopeptides.
It could rely on the minimal set of 20 compositions available from one dataset or be enhanced with 23 to 26 items rationally collected in GlyConnect to strengthen the consistency of the set. From this protocol, it is important to remember that a glycome cannot be limited to a list of items. And that precisely with glycoinformatics tools, you can show the dependencies between these items which ultimately will explain their function.