Quantifying intermolecular interactions with quantum chemistry (QC) is useful for many chemical problems, including understanding the nature of protein-ligand interactions. Unfortunately, QC computations on protein-ligand systems are too computationally expensive for most use cases. The flourishing field of machine-learned (ML) potentials is a promising solution, but it is limited by an inability to easily capture long range, non-local interactions.
View Article and Find Full Text PDFThe message passing neural network (MPNN) framework is a promising tool for modeling atomic properties but is, until recently, incompatible with directional properties, such as Cartesian tensors. We propose a modified Cartesian MPNN (CMPNN) suitable for predicting atom-centered multipoles, an essential component of ab initio force fields. The efficacy of this model is demonstrated on a newly developed dataset consisting of 46 623 chemical structures and corresponding high-quality atomic multipoles, which was deposited into the publicly available Molecular Sciences Software Institute QCArchive server.
View Article and Find Full Text PDFIntermolecular interactions are critical to many chemical phenomena, but their accurate computation using ab initio methods is often limited by computational cost. The recent emergence of machine learning (ML) potentials may be a promising alternative. Useful ML models should not only estimate accurate interaction energies but also predict smooth and asymptotically correct potential energy surfaces.
View Article and Find Full Text PDFAccurate prediction of intermolecular interaction energies is a fundamental challenge in electronic structure theory due to their subtle character and small magnitudes relative to total molecular energies. Symmetry adapted perturbation theory (SAPT) provides rigorous quantum mechanical means for computing such quantities directly and accurately, but for a computational cost of at least O(N), where N is the number of atoms. Here, we report machine learned models of SAPT components with a computational cost that scales asymptotically linearly, O(N).
View Article and Find Full Text PDFMatched molecular pair analysis (MMPA) has emerged as a powerful approach to mine and extract tacit knowledge from measured databases of small molecules. Extracted knowledge from past experimentation can assist future lead optimization as an idea generation tool and, hence, reduce the number of design-synthesis-test cycles. While attractive and intuitive, MMPA still presents several limitations.
View Article and Find Full Text PDFBackground: In recent years, research in artificial neural networks has resurged, now under the deep-learning umbrella, and grown extremely popular. Recently reported success of DL techniques in crowd-sourced QSAR and predictive toxicology competitions has showcased these methods as powerful tools in drug-discovery and toxicology research. The aim of this work was dual, first large number of hyper-parameter configurations were explored to investigate how they affect the performance of DNNs and could act as starting points when tuning DNNs and second their performance was compared to popular methods widely employed in the field of cheminformatics namely Naïve Bayes, k-nearest neighbor, random forest and support vector machines.
View Article and Find Full Text PDFPrediction of compound toxicity is essential because covering the vast chemical space requiring safety assessment using traditional experimentally-based, resource-intensive techniques is impossible. However, such prediction is nontrivial due to the complex causal relationship between compound structure and harm. Protein target annotations and experimental outcomes encode relevant bioactivity information complementary to chemicals' structures.
View Article and Find Full Text PDFThe increase of publicly available bioactivity data has led to the extensive development and usage of in silico bioactivity prediction algorithms. A particularly popular approach for such analyses is the multiclass Naïve Bayes, whose output is commonly processed by applying empirically-derived likelihood score thresholds. In this work, we describe a systematic way for deriving score cut-offs on a per-protein target basis and compare their performance with global thresholds on a large scale using both 5-fold cross-validation (ChEMBL 14, 189k ligand-protein pairs over 477 protein targets) and external validation (WOMBAT, 63k pairs, 421 targets).
View Article and Find Full Text PDFBackground: An in silico mechanism-of-action analysis protocol was developed, comprising molecule bioactivity profiling, annotation of predicted targets with pathways and calculation of enrichment factors to highlight targets and pathways more likely to be implicated in the studied phenotype.
Results: The method was applied to a cytotoxicity phenotypic endpoint, with enriched targets/pathways found to be statistically significant when compared with 100 random datasets. Application on a smaller apoptotic set (10 molecules) did not allowed to obtain statistically relevant results, suggesting that the protocol requires modification such as analysis of the most frequently predicted targets/annotated pathways.
Chemical diversity is a widely applied approach to select structurally diverse subsets of molecules, often with the objective of maximizing the number of hits in biological screening. While many methods exist in the area, few systematic comparisons using current descriptors in particular with the objective of assessing diversity in bioactivity space have been published, and this shortage is what the current study is aiming to address. In this work, 13 widely used molecular descriptors were compared, including fingerprint-based descriptors (ECFP4, FCFP4, MACCS keys), pharmacophore-based descriptors (TAT, TAD, TGT, TGD, GpiDAPH3), shape-based descriptors (rapid overlay of chemical structures (ROCS) and principal moments of inertia (PMI)), a connectivity-matrix-based descriptor (BCUT), physicochemical-property-based descriptors (prop2D), and a more recently introduced molecular descriptor type (namely, "Bayes Affinity Fingerprints").
View Article and Find Full Text PDFIn this article, we discuss our recent work in elucidating the mode-of-action of compounds used in traditional medicine including Ayurvedic medicine. Using computational ('in silico') approach, we predict potential targets for Ayurvedic anti-cancer compounds, obtained from the Indian Plant Anticancer Database given its chemical structure. In our analysis, we observed that: (i) the targets predicted can be connected to cancer pathogenesis i.
View Article and Find Full Text PDFObjective: This study exemplifies computer-aided (in silico) approaches in assessing the risks of new psychoactive substances emerging in the European Union. In this work, we (i) consider the potential of Ostarine exhibiting psychoactivity and (ii) anticipate potential activities and toxicities of 4-methylamphetamine.
Method: The approach, termed in silico target prediction, suggests potential protein targets modulated by compounds given their chemical structure.
In this study, two probabilistic machine-learning algorithms were compared for in silico target prediction of bioactive molecules, namely the well-established Laplacian-modified Naïve Bayes classifier (NB) and the more recently introduced (to Cheminformatics) Parzen-Rosenblatt Window. Both classifiers were trained in conjunction with circular fingerprints on a large data set of bioactive compounds extracted from ChEMBL, covering 894 human protein targets with more than 155,000 ligand-protein pairs. This data set is also provided as a benchmark data set for future target prediction methods due to its size as well as the number of bioactivity classes it contains.
View Article and Find Full Text PDFDiversity selection is a frequently applied strategy for assembling high-throughput screening libraries, making the assumption that a diverse compound set increases chances of finding bioactive molecules. Based on previous work on experimental 'affinity fingerprints', in this study, a novel diversity selection method is benchmarked that utilizes predicted bioactivity profiles as descriptors. Compounds were selected based on their predicted activity against half of the targets (training set), and diversity was assessed based on coverage of the remaining (test set) targets.
View Article and Find Full Text PDFTraditional Chinese medicine (TCM) and Ayurveda have been used in humans for thousands of years. While the link to a particular indication has been established in man, the mode-of-action (MOA) of the formulations often remains unknown. In this study, we aim to understand the MOA of formulations used in traditional medicine using an in silico target prediction algorithm, which aims to predict protein targets (and hence MOAs), given the chemical structure of a compound.
View Article and Find Full Text PDFCancer remains a fundamental burden to public health despite substantial efforts aimed at developing effective chemotherapeutics and significant advances in chemotherapeutic regimens. The major challenge in anti-cancer drug design is to selectively target cancer cells with high specificity. Research into treating malignancies by targeting altered metabolism in cancer cells is supported by computational approaches, which can take a leading role in identifying candidate targets for anti-cancer therapy as well as assist in the discovery and optimisation of anti-cancer agents.
View Article and Find Full Text PDFGiven the tremendous growth of bioactivity databases, the use of computational tools to predict protein targets of small molecules has been gaining importance in recent years. Applications span a wide range, from the 'designed polypharmacology' of compounds to mode-of-action analysis. In this review, we firstly survey databases that can be used for ligand-based target prediction and which have grown tremendously in size in the past.
View Article and Find Full Text PDF