REINVENT 4 is a modern open-source generative AI framework for the design of small molecules. The software utilizes recurrent neural networks and transformer architectures to drive molecule generation. These generators are seamlessly embedded within the general machine learning optimization algorithms, transfer learning, reinforcement learning and curriculum learning.
View Article and Find Full Text PDFIn image-based profiling, software extracts thousands of morphological features of cells from multi-channel fluorescence microscopy images, yielding single-cell profiles that can be used for basic research and drug discovery. Powerful applications have been proven, including clustering chemical and genetic perturbations on the basis of their similar morphological impact, identifying disease phenotypes by observing differences in profiles between healthy and diseased cells and predicting assay outcomes by using machine learning, among many others. Here, we provide an updated protocol for the most popular assay for image-based profiling, Cell Painting.
View Article and Find Full Text PDFUncontrolled angiogenesis is a common denominator underlying many deadly and debilitating diseases such as myocardial infarction, chronic wounds, cancer, and age-related macular degeneration. As the current range of FDA-approved angiogenesis-based medicines are far from meeting clinical demands, the vast reserve of natural products from traditional Chinese medicine (TCM) offers an alternative source for developing pro-angiogenic or anti-angiogenic modulators. Here, we investigated 100 traditional Chinese medicine-derived individual metabolites which had reported gene expression in MCF7 cell lines in the Gene Expression Omnibus (GSE85871).
View Article and Find Full Text PDFMeasurements of protein-ligand interactions have reproducibility limits due to experimental errors. Any model based on such assays will consequentially have such unavoidable errors influencing their performance which should ideally be factored into modelling and output predictions, such as the actual standard deviation of experimental measurements (σ) or the associated comparability of activity values between the aggregated heterogenous activity units (i.e.
View Article and Find Full Text PDFThe understanding of the mechanism-of-action (MoA) of compounds and the prediction of potential drug targets play an important role in small-molecule drug discovery. The aim of this work was to compare chemical and cell morphology information for bioactivity prediction. The comparison was performed using bioactivity data from the ExCAPE database, image data (in the form of CellProfiler features) from the Cell Painting data set (the largest publicly available data set of cell images with ∼30,000 compound perturbations), and extended connectivity fingerprints (ECFPs) using the multitask Bayesian matrix factorization (BMF) approach Macau.
View Article and Find Full Text PDFDichapetalum madagascariense Poir (Dichapetalaceae) is traditionally used to treat bacterial infections, jaundice, urethritis and viral hepatitis in Africa. Its root contains a broad spectrum of biologically active dichapetalins. To evaluate the plant's effect on human MCF-7 cells and its' antibacterial and antiparasitic potentials, we isolated and identified the known dichapetalins A and M from the roots.
View Article and Find Full Text PDFMachine learning and artificial intelligence are increasingly being applied to the drug-design process as a result of the development of novel algorithms, growing access, the falling cost of computation and the development of novel technologies for generating chemically and biologically relevant data. There has been recent progress in fields such as molecular de novo generation, synthetic route prediction and, to some extent, property predictions. Despite this, most research in these fields has focused on improving the accuracy of the technologies, rather than on quantifying the uncertainty in the predictions.
View Article and Find Full Text PDFIn the context of bioactivity prediction, the question of how to calibrate a score produced by a machine learning method into a probability of binding to a protein target is not yet satisfactorily addressed. In this study, we compared the performance of three such methods, namely, Platt scaling (PS), isotonic regression (IR), and Venn-ABERS predictors (VA), in calibrating prediction scores obtained from ligand-target prediction comprising the Naïve Bayes, support vector machines, and random forest (RF) algorithms. Calibration quality was assessed on bioactivity data available at AstraZeneca for 40 million data points (compound-target pairs) across 2112 targets and performance was assessed using stratified shuffle split (SSS) and leave 20% of scaffolds out (L20SO) validation.
View Article and Find Full Text PDFFunctional magnetic resonance imaging (fMRI) is an extensively used method for the investigation of normal and pathological brain function. In particular, fMRI has been used to characterize spatiotemporal hemodynamic response to pharmacological challenges as a non-invasive readout of neuronal activity. However, the mechanisms underlying regional signal changes are yet unclear.
View Article and Find Full Text PDFDespite the increasing knowledge in both the chemical and biological domains the assimilation and exploration of heterogeneous datasets, encoding information about the chemical, bioactivity and phenotypic properties of compounds, remains a challenge due to requirement for overlap between chemicals assayed across the spaces. Here, we have constructed a novel dataset, larger than we have used in prior work, comprising 579 acute oral toxic compounds and 1427 non-toxic compounds derived from regulatory GHS information, along with their corresponding molecular and protein target descriptors and qHTS in vitro assay readouts from the Tox21 project. We found no clear association between the results of a FAFDrugs4 toxicophore screen and the acute oral toxicity classifications for our compound set; and a screen using a subset of the ToxAlerts toxicophores was also of limited utility, with only slight enrichment toward the toxic set (odds ratio of 1.
View Article and Find Full Text PDFNeuropsychiatric disorders are the third leading cause of global disease burden. Current pharmacological treatment for these disorders is inadequate, with often insufficient efficacy and undesirable side effects. One reason for this is that the links between molecular drug action and neurobehavioral drug effects are elusive.
View Article and Find Full Text PDFprotein target deconvolution is frequently used for mechanism-of-action investigations; however existing protocols usually do not predict compound functional effects, such as activation or inhibition, upon binding to their protein counterparts. This study is hence concerned with including functional effects in target prediction. To this end, we assimilated a bioactivity training set for 332 targets, comprising 817,239 active data points with unknown functional effect (binding data) and 20,761,260 inactive compounds, along with 226,045 activating and 1,032,439 inhibiting data points from functional screens.
View Article and Find Full Text PDFMotivation: In silico approaches often fail to utilize bioactivity data available for orthologous targets due to insufficient evidence highlighting the benefit for such an approach. Deeper investigation into orthologue chemical space and its influence toward expanding compound and target coverage is necessary to improve the confidence in this practice.
Results: Here we present analysis of the orthologue chemical space in ChEMBL and PubChem and its impact on target prediction.
One important, however, poorly understood, concept of Traditional Chinese Medicine (TCM) is that of hot, cold, and neutral nature of its bioactive principles. To advance the field, in this study, we analyzed compound-nature pairs from TCM on a large scale (>23 000 structures) via chemical space visualizations to understand its physicochemical domain and in silico target prediction to understand differences related to their modes-of-action (MoA) against proteins. We found that overall TCM natures spread into different subclusters with specific molecular patterns, as opposed to forming coherent global groups.
View Article and Find Full Text PDFThe epidermal growth factor receptor (EGFR) is a validated therapeutic target for triple-negative breast cancer (TNBC). In the present study, we synthesize novel adamantanyl-based thiadiazolyl pyrazoles by introducing the adamantane ring to thiazolopyrazoline. On the basis of loss of cell viability in TNBC cells, 4-(adamantan-1-yl)-2-(3-(2,4-dichlorophenyl)-5-phenyl-4,5-dihydro-1-pyrazol-1-yl)thiazole (APP) was identified as a lead compound.
View Article and Find Full Text PDFCancer cell line panels have proved useful disease models to, among others, identify genomic markers of drug sensitivity and to develop new anticancer drugs. The increasing availability of in vitro sensitivity and cell line profiling data sets raises the question of whether this information could be used, and to which extent, to predict the activity of drugs in cancer cell lines and, ultimately, in patients tumors. Drug sensitivity prediction embraces those approaches aiming at predicting in vitro drug activity on cancer cell lines by integrating genomic and/or chemical information using machine learning models.
View Article and Find Full Text PDFWhile mechanisms of cytotoxicity and cytostaticity have been studied extensively from the biological side, relatively little is currently understood regarding areas of chemical space leading to cytotoxicity and cytostasis in large compound collections. Predicting and rationalizing potential adverse mechanism-of-actions (MoAs) of small molecules is however crucial for screening library design, given the link of even low level cytotoxicity and adverse events observed in man. In this study, we analyzed results from a cell-based cytotoxicity screening cascade, comprising 296 970 nontoxic, 5784 cytotoxic and cytostatic, and 2327 cytostatic-only compounds evaluated on the THP-1 cell-line.
View Article and Find Full Text PDFBackground: In silico analyses are increasingly being used to support mode-of-action investigations; however many such approaches do not utilise the large amounts of inactive data held in chemogenomic repositories. The objective of this work is concerned with the integration of such bioactivity data in the target prediction of orphan compounds to produce the probability of activity and inactivity for a range of targets. To this end, a novel human bioactivity data set was constructed through the assimilation of over 195 million bioactivity data points deposited in the ChEMBL and PubChem repositories, and the subsequent application of a sphere-exclusion selection algorithm to oversample presumed inactive compounds.
View Article and Find Full Text PDF