The CACHE challenges are a series of prospective benchmarking exercises to evaluate progress in the field of computational hit-finding. Here we report the results of the inaugural CACHE challenge in which 23 computational teams each selected up to 100 commercially available compounds that they predicted would bind to the WDR domain of the Parkinson's disease target LRRK2, a domain with no known ligand and only an apo structure in the PDB. The lack of known binding data and presumably low druggability of the target is a challenge to computational hit finding methods.
View Article and Find Full Text PDFEfficient prioritization of bioactive compounds from high throughput screening campaigns is a fundamental challenge for accelerating drug development efforts. In this study, we present the first data-driven approach to simultaneously detect assay interferents and prioritize true bioactive compounds. By analyzing the learning dynamics during training of a gradient boosting model on noisy high throughput screening data using a novel formulation of sample influence, we are able to distinguish between compounds exhibiting the desired biological response and those producing assay artifacts.
View Article and Find Full Text PDFThe widespread proliferation of artificial intelligence (AI) and machine learning (ML) methods has a profound effect on the drug discovery process. However, many scientists are reluctant to utilize these powerful tools due to the steep learning curve typically associated with them. AIDDISON offers a convenient, secure, web-based platform for drug discovery, addressing the reluctance of scientists to adopt AI and ML methods due to the steep learning curve.
View Article and Find Full Text PDFIn this study, we demonstrate the feasibility of yeast surface display (YSD) and nextgeneration sequencing (NGS) in combination with artificial intelligence and machine learning methods (AI/ML) for the identification of de novo humanized single domain antibodies (sdAbs) with favorable early developability profiles. The display library was derived from a novel approach, in which VHH-based CDR3 regions obtained from a llama (Lama glama), immunized against NKp46, were grafted onto a humanized VHH backbone library that was diversified in CDR1 and CDR2. Following NGS analysis of sequence pools from two rounds of fluorescence-activated cell sorting we focused on four sequence clusters based on NGS frequency and enrichment analysis as well as in silico developability assessment.
View Article and Find Full Text PDFFederated multipartner machine learning has been touted as an appealing and efficient method to increase the effective training data volume and thereby the predictivity of models, particularly when the generation of training data is resource-intensive. In the landmark MELLODDY project, indeed, each of ten pharmaceutical companies realized aggregated improvements on its own classification or regression models through federated learning. To this end, they leveraged a novel implementation extending multitask learning across partners, on a platform audited for privacy and security.
View Article and Find Full Text PDFDecision tree ensembles are among the most robust, high-performing and computationally efficient machine learning approaches for quantitative structure-activity relationship (QSAR) modeling. Among them, gradient boosting has recently garnered particular attention, for its performance in data science competitions, virtual screening campaigns, and bioactivity prediction. However, different variants of gradient boosting exist, the most popular being XGBoost, LightGBM and CatBoost.
View Article and Find Full Text PDFPraziquantel (PZQ) is an essential anthelmintic drug recently established to be an activator of a Transient Receptor Potential Melastatin (TRPM ) ion channel in trematode worms. Bioinformatic, mutagenesis and drug metabolism work indicate that the cyclohexyl ring of PZQ is a key pharmacophore for activation of trematode TRPM , as well as serving as the primary site of oxidative metabolism which results in PZQ being a short-lived drug. Based on our recent findings, the hydrophobic cleft in schistosome TRPM defined by three hydrophobic residues surrounding the cyclohexyl ring has little tolerance for polarity.
View Article and Find Full Text PDFWhile in the last years there has been a dramatic increase in the number of available bioassay datasets, many of them suffer from extremely imbalanced distribution between active and inactive compounds. Thus, there is an urgent need for novel approaches to tackle class imbalance in drug discovery. Inspired by recent advances in computer vision, we investigated a panel of alternative loss functions for imbalanced classification in the context of Gradient Boosting and benchmarked them on six datasets from public and proprietary sources, for a total of 42 tasks and 2 million compounds.
View Article and Find Full Text PDFPraziquantel (PZQ) is an essential medicine for treating parasitic flatworm infections such as schistosomiasis, which afflicts over 250 million people. However, PZQ is not universally effective, lacking activity against liver flukes of the genus. The reason for this insensitivity is unclear, as the mechanism of PZQ action is unknown.
View Article and Find Full Text PDFThe repertoire of natural products offers tremendous opportunities for chemical biology and drug discovery. Natural product-inspired synthetic molecules represent an ecologically and economically sustainable alternative to the direct utilization of natural products. De novo design with machine intelligence bridges the gap between the worlds of bioactive natural products and synthetic molecules.
View Article and Find Full Text PDFMolecular shape and pharmacological function are interconnected. To capture shape, the fractal dimensionality concept was employed, providing a natural similarity measure for the virtual screening of de novo generated small molecules mimicking the structurally complex natural product (-)-englerin A. Two of the top-ranking designs were synthesized and tested for their ability to modulate transient receptor potential (TRP) cation channels which are cellular targets of (-)-englerin A.
View Article and Find Full Text PDFA virtual screening protocol based on machine learning models was used to identify mimetics of the natural product (-)-galantamine. This fully automated approach identified eight compounds with bioactivities on at least one of the macromolecular targets of (-)-galantamine, with different polypharmacological profiles. Two of the computer-generated hits possess an expanded spectrum of bioactivity on targets relevant to the treatment of Alzheimer's disease and are suitable for hit-to-lead expansion.
View Article and Find Full Text PDFThe bile acid activated transcription factor farnesoid X receptor (FXR) has revealed therapeutic potential as a molecular drug target for the treatment of hepatic and metabolic disorders. Despite strong efforts in FXR ligand development, the structural diversity among the known FXR modulators is limited. Only four molecular frameworks account for more than 50 % of the FXR modulators annotated in ChEMBL.
View Article and Find Full Text PDFInvited for this month's cover picture is the group of Prof. Dr. Gisbert Schneider from the Swiss Federal Institute of Technology (ETH) Zurich (Switzerland).
View Article and Find Full Text PDFThe lack of potent subtype-selective modulators of retinoid X receptors (RXRs) has hindered their full exploitation as promising drug targets. Using computational similarity searching, target prediction and automated design, we identified novel RXR ligands exhibiting innovative molecular frameworks, pronounced receptor-subtype preference and suitable properties for hit-to-lead expansion.
View Article and Find Full Text PDFNatural products (NPs) are progressively recognized as invaluable source of pharmacological tools and lead structures. To enable NP-inspired retinoid X receptor (RXR) modulator design, three novel RXR-targeting NPs were computationally identified. Among them, valerenic acid was found to be selective for RXRβ, rendering it a unique pharmacological tool compound.
View Article and Find Full Text PDFGenerative artificial intelligence offers a fresh view on molecular design. We present the first-time prospective application of a deep learning model for designing new druglike compounds with desired activities. For this purpose, we trained a recurrent neural network to capture the constitution of a large set of known bioactive compounds represented as SMILES strings.
View Article and Find Full Text PDFMolecular descriptors capture diverse structural information of molecules and are a prerequisite for ligand-based similarity searching. In this study, we introduce topological matrix-based descriptors to virtual screening for hit discovery. We evaluated the usefulness of matrix-based descriptors in a retrospective setting and compared them with topological pharmacophore descriptors.
View Article and Find Full Text PDFWe present the computational de novo design of synthetically accessible chemical entities that mimic the complex sesquiterpene natural product (-)-Englerin A. We synthesized lead-like probes from commercially available building blocks and profiled them for activity against a computationally predicted panel of macromolecular targets. Both the design template (-)-Englerin A and its low-molecular weight mimetics presented nanomolar binding affinities and antagonized the transient receptor potential calcium channel TRPM8 in a cell-based assay, without showing target promiscuity or frequent-hitter properties.
View Article and Find Full Text PDF