A subscription to JoVE is required to view this content. Sign in or start your free trial.
Here, we provide a methodology that uses different molecular representations to display and analyze the chemical space of natural compound data sets, with a focus on applications related to drug discovery.
Chemical space is a multidimensional descriptor space that encloses all possible molecules, and at least 1 x 1060 organic substances with a molecular weight below 500 Da are thought to be potentially relevant for drug discovery. Natural products have been the primary source of the new pharmacological entities marketed during the past forty years and continue to be one of the most productive sources for the creation of innovative medications. Chemoinformatics-based computational tools accelerate the drug development process for natural products. Methods including estimating bioactivities, safety profiles, ADME, and natural product likeness measurement have been used. Here, we go over recent developments in chemoinformatic tools designed to visualize, characterize, and expand the chemical space of natural compound data sets using various molecular representations, create visual representations of such spaces, and investigate structure-property relationships within chemical spaces. With an emphasis on drug discovery applications, we evaluate the open-source databases BIOFACQUIM and PeruNPDB as proof of concept.
Natural products (NPs), which are chemical compounds created by living things, have been utilized as traditional treatments for centuries. Individual NPs have been created as medications in the modern era and successfully exploited as lead compounds in drug discovery1. Marine, fungal, bacterial, plant, and endogenous substances created by humans and animals are included in the category of bioactive compounds, as are venoms and poisons produced by various animals2. As a result, for forty years, the number of medications made by NPs represented a significant source of new pharmacological substances3, emphasizing that NPs have been crucial in the development of new medications, particularly for the treatment of cancer and infectious diseases, as well as for other therapeutic conditions like multiple sclerosis and cardiovascular disease4. Furthermore, 64.9% of the 185 small compounds that were authorized to treat cancer between 1981 and 2019 were unmodified NPs or synthetic medicines with an NP pharmacophore3.
Chemoinformatics, a well-established inter-discipline that rests on the concept of chemical space, has been used to analyze and visualize the chemical space of NPs' physicochemical qualities linked to drug-like traits5. Chemoinformatics has shown a substantial impact on drug design and discovery based on NPs6. The chemical space of a group of compounds is not always unique. It will depend on the collection of descriptors used to define it, which means that studying the chemical space of NPs as any other set of compounds, presents particular challenges that rest on molecular representation7. This endeavor can be approached using a variety of molecular descriptors and data visualization techniques. In contrast, the most often utilized techniques are principal component analysis (PCA), scaffold trees, self-organizing maps, generative topographic mapping (GTM), and a novel visualization technique called tree maps (TMAPs)8. Also, the collection, evaluation, and dissemination of NP's chemical information in compound databases is one of the uses of chemoinformatics in NP research. In contrast, with the introduction of big data, this is especially pertinent9.
Here, the open-source NP databases BIOFACQUIM10 and PeruNPDB11 are used to describe the protocol that searches for visualization and characterization of the chemical space of natural compound data sets using various molecular representations, creates visual representations of such spaces and investigates structure-property relationships within chemical spaces, with an emphasis on drug discovery applications.
1. Software download and installation
2. Construction and curation of a compound database
NOTE: Find substances and sources that have the necessary data. The user is advised to have the following details for each compound in a spreadsheet.
3. Molecular descriptors and diversity analysis
NOTE: Molecular descriptors, such as physicochemical qualities, and molecular fingerprints and chemical scaffolds, are the most common approaches to represent molecules in chemoinformatic applications. Analysis can be performed here: http://132.248.103.152:3838/PUMA/. All steps described below are detailed on the PUMA website.
4. Visualization of the chemical space
NOTE: It is possible to condense the majority of the pertinent data into a small number of variables using PCA and other dimensionality reduction techniques. Visualizations of the chemical space are therefore made possible.
5. Consensus diversity plots
NOTE: Visual representations have been developed to summarize a few characteristics that can be used to quantify variety. The consensus diversity plots (CDPs)12 analysis can be performed here http://132.248.103.152:3838/CDPlots/.
Molecular properties and visualization of the chemical space
All compounds in the BIOFACQUIM10, PeruNPDB11, and FDA13 datasets had six physicochemical properties calculated for them. These qualities were then plotted onto violin plots, which allow one to see how the properties of the three studied datasets are distributed (Figure 1). The distribution profiles of the six physicochemical parameters o...
Due to its many potential uses, such as compound classification, compound selection, exploring structure-activity links, and navigating through structure-property interactions, the concept of chemical space is nowadays widely employed in the drug discovery and development process14. Also, the creation of NP databases is a fundamental procedure to perform various computational studies, including the design of chemical libraries, characterization and comparison of the chemical space, the study of SA...
The authors declare that they do not have any conflict of interest.
HLBC and MACH thank the funding of Universidad Catolica de Santa Maria (grants 27499-R-2020, 27574-R-2020, 7309-CU-2020, and 28048-R-2021). JLMF thanks the funding of DGAPA, UNAM, Programa de Apoyo a Proyectos de Investigación e Innovación Tecnológica (PAPIIT), grant No. IN201321.
Name | Company | Catalog Number | Comments |
GraphPad Prism | GraphPad Prism | https://www.graphpad.com/ | |
KNIME platform | KNIME | https://www.knime.com | |
Osiris DataWarrior (OSIRIS) software | openmolecules.org | https://openmolecules.org/datawarrior/ | |
PUMA | PUMA: Platform for Unified Molecular Analysis | http://132.248.103.152:3838/PUMA/ |
Request permission to reuse the text or figures of this JoVE article
Request PermissionThis article has been published
Video Coming Soon
Copyright © 2025 MyJoVE Corporation. All rights reserved