The exploration of chemical space is a fundamental aspect of chemoinformatics, particularly when one explores a large compound data set to relate chemical structures with molecular properties. In this study, we extend our previous work on chemical space visualization at the pharmacophoric level. Instead of using conventional binary classification of affinity (active vs inactive), we introduce a refined approach that categorizes compounds into four distinct classes based on their activity levels: super active, very active, active, and inactive.
View Article and Find Full Text PDFThis paper presents a novel approach called Pharmacophore Activity Delta for extracting outstanding pharmacophores from a chemogenomic dataset, with a specific focus on a kinase target known as BCR-ABL. The method involves constructing a Hasse diagram, referred to as the pharmacophore network, by utilizing the subgraph partial order as an initial step, leading to the identification of pharmacophores for further evaluation. A pharmacophore is classified as a 'Pharmacophore Activity Delta' if its capability to effectively discriminate between active vs inactive molecules significantly deviates (by at least δ standard deviations) from the mean capability of its related pharmacophores.
View Article and Find Full Text PDFMaximum common substructures (MCS) have received a lot of attention in the chemoinformatics community. They are typically used as a similarity measure between molecules, showing high predictive performance when used in classification tasks, while being easily explainable substructures. In the present work, we applied the Pairwise Maximum Common Subgraph Feature Generation (PMCSFG) algorithm to automatically detect toxicophores (structural alerts) and to compute fingerprints based on MCS.
View Article and Find Full Text PDFIn this work, we propose to analyze the potential of a new type of pharmacophoric descriptors coupled to a novel feature transformation technique, called Weight-Matrix Learning (WML, based on a feed-forward neural network). The application concerns virtual screening on a tyrosine kinase named BCR-ABL. First, the compounds were described using three different families of descriptors: our new pharmacophoric descriptors, and two circular fingerprints, ECFP4 and FCFP4.
View Article and Find Full Text PDFThis paper introduces a general method that can be used to create groups of pharmacophores to support their further in-depth analysis. A BCR-ABL molecular dataset was used to calculate graph edit distances between pharmacophores and led to their organization into a novel pharmacophore network. The application of a graph layout algorithm allowed us to discriminate between the pharmacophores associated with active compounds and those associated with inactive compounds.
View Article and Find Full Text PDFHistorically, structure-activity relationship (SAR) analysis has focused on small sets of molecules, but in recent years, there has been increasing efforts to analyze the growing amount of data stored in public databases like ChEMBL. The pharmacophore network introduced herein is dedicated to the organization of a set of pharmacophores automatically discovered from a large data set of molecules. The network navigation allows to derive essential tasks of a drug discovery process, including the study of the relations between different chemical series, the analysis of the influence of additional chemical features on the compounds' activity, and the identification of diverse binding modes.
View Article and Find Full Text PDFThis article introduces a new type of structural fragment called a geometrical pattern. Such geometrical patterns are defined as molecular graphs that include a labelling of atoms together with constraints on interatomic distances. The discovery of geometrical patterns in a chemical dataset relies on the induction of multiple decision trees combined in random forests.
View Article and Find Full Text PDFThe biomarker development in metabolomics aims at discriminating diseased from normal subjects and at creating a predictive model that can be used to diagnose new subjects. From a case study on human hepatocellular carcinoma (HCC), we studied for the first time the potential usefulness of the emerging patterns (EPs) that come from the data mining domain. When applied to a metabolomics data set labeled with two classes (e.
View Article and Find Full Text PDFThis study is dedicated to the introduction of a novel method that automatically extracts potential structural alerts from a data set of molecules. These triggering structures can be further used for knowledge discovery and classification purposes. Computation of the structural alerts results from an implementation of a sophisticated workflow that integrates a graph mining tool guided by growth rate and stability.
View Article and Find Full Text PDFStarting from a random set of structures taken from the European Chemical Bureau (ECB) Web site, an estimation of the classification by acute category in ecotoxicology was carried out. This estimation was based on two approaches. One approach consists in starting with global quantitative structure-activity relationship (QSAR) equations, analyzing the results and defining an interpretation in terms of overall results and mode of action.
View Article and Find Full Text PDF