Federated multipartner machine learning has been touted as an appealing and efficient method to increase the effective training data volume and thereby the predictivity of models, particularly when the generation of training data is resource-intensive. In the landmark MELLODDY project, indeed, each of ten pharmaceutical companies realized aggregated improvements on its own classification or regression models through federated learning. To this end, they leveraged a novel implementation extending multitask learning across partners, on a platform audited for privacy and security.
View Article and Find Full Text PDFIn this study, we describe the rapid identification of potent binders for the WD40 repeat domain (WDR) of DCAF1. This was achieved by two rounds of iterative focused screening of a small set of compounds selected on the basis of internal WDR domain knowledge followed by hit expansion. Subsequent structure-based design led to nanomolar potency binders with a clear exit vector enabling DCAF1-based bifunctional degrader exploration.
View Article and Find Full Text PDFWith the increase in applications of machine learning methods in drug design and related fields, the challenge of designing sound test sets becomes more and more prominent. The goal of this challenge is to have a realistic split of chemical structures (compounds) between training, validation and test set such that the performance on the test set is meaningful to infer the performance in a prospective application. This challenge is by its own very interesting and relevant, but is even more complex in a federated machine learning approach where multiple partners jointly train a model under privacy-preserving conditions where chemical structures must not be shared between the different participating parties.
View Article and Find Full Text PDFMALT1 plays a central role in immune cell activation by transducing NF-κB signaling, and its proteolytic activity represents a key node for therapeutic intervention. Two cycles of scaffold morphing of a high-throughput biochemical screening hit resulted in the discovery of MLT-231, which enabled the successful pharmacological validation of MALT1 allosteric inhibition in preclinical models of humoral immune responses and B-cell lymphomas. Herein, we report the structural activity relationships (SARs) and analysis of the physicochemical properties of a pyrazolopyrimidine-derived compound series.
View Article and Find Full Text PDFThis article summarizes the evolution of the screening deck at the Novartis Institutes for BioMedical Research (NIBR). Historically, the screening deck was an assembly of all available compounds. In 2015, we designed a first deck to facilitate access to diverse subsets with optimized properties.
View Article and Find Full Text PDFChemogenetic libraries, collections of well-defined chemical probes, provide tremendous value to biomedical research but require substantial effort to ensure diversity as well as quality of the contents. We have assembled a chemogenetic library by data mining and crowdsourcing institutional expertise. We are sharing our approach, lessons learned, and disclosing our current collection of 4,185 compounds with their primary annotated gene targets (https://github.
View Article and Find Full Text PDFMultiplexed gene-signature-based phenotypic assays are increasingly used for the identification and profiling of small molecule-tool compounds and drugs. Here we introduce a method (provided as R-package) for the quantification of the dose-response potency of a gene-signature as EC and IC values. Two signaling pathways were used as models to validate our methods: beta-adrenergic agonistic activity on cAMP generation (dedicated dataset generated for this study) and EGFR inhibitory effect on cancer cell viability.
View Article and Find Full Text PDFStarting from a weak screening hit, potent and selective inhibitors of the MALT1 protease function were elaborated. Advanced compounds displayed high potency in biochemical and cellular assays. Compounds showed activity in a mechanistic Jurkat T cell activation assay as well as in the B-cell lymphoma line OCI-Ly3, which suggests potential use of MALT1 inhibitors in the treatment of autoimmune diseases as well as B-cell lymphomas with a dysregulated NF-κB pathway.
View Article and Find Full Text PDFThe intramembrane protease signal peptide peptidase-like 2a (SPPL2a) is a potential drug target for the treatment of autoimmune diseases due to an essential role in B cells and dendritic cells. To screen a library of 1.4 million compounds for inhibitors of SPPL2a, we developed an imaging assay detecting nuclear translocation of the proteolytically released cytosolic substrate fragment.
View Article and Find Full Text PDFHigh-throughput screening (HTS) is an integral part of early drug discovery. Herein, we focused on those small molecules in a screening collection that have never shown biological activity despite having been exhaustively tested in HTS assays. These compounds are referred to as 'dark chemical matter' (DCM).
View Article and Find Full Text PDFFragile X syndrome (FXS) is the most common form of inherited mental retardation, and it is caused in most of cases by epigenetic silencing of the Fmr1 gene. Today, no specific therapy exists for FXS, and current treatments are only directed to improve behavioral symptoms. Neuronal progenitors derived from FXS patient induced pluripotent stem cells (iPSCs) represent a unique model to study the disease and develop assays for large-scale drug discovery screens since they conserve the Fmr1 gene silenced within the disease context.
View Article and Find Full Text PDFThe use of small molecules to modulate cellular processes is a powerful approach to investigate gene function as a complement to genetic approaches. The discovery and characterization of compounds that modulate translation initiation, the rate-limiting step of protein synthesis, is important both to provide tool compounds to explore this fundamental biological process and to further evaluate protein synthesis as a therapeutic target. While most messenger ribonucleic acids (mRNAs) recruit ribosomes via their 5' cap, some viral and cellular mRNAs initiate protein synthesis via an alternative "cap-independent" mechanism utilizing internal ribosome entry sites (IRES) elements, which are complex mRNA secondary structures, localized within the 5' nontranslated region of the mRNA upstream of the AUG start codon.
View Article and Find Full Text PDFTranslation initiation is a fine-tuned process that plays a critical role in tumorigenesis. The use of small molecules that modulate mRNA translation provides tool compounds to explore the mechanism of translational initiation and to further validate protein synthesis as a potential pharmaceutical target for cancer therapeutics. This report describes the development and use of a click beetle, dual luciferase cell-based assay multiplexed with a measure of compound toxicity using resazurin to evaluate the differential effect of natural products on cap-dependent or internal ribosome entry site (IRES)-mediated translation initiation and cell viability.
View Article and Find Full Text PDFRecently a novel method termed compound set enrichment (CSE) has been described that uses the activity distribution of a structural class of compounds to identify hit series from primary screening data. This report describes how this method can be used to identify such hit series, even when no hits according to conventional hit-calling methods for a given structural class are present in the data set. Such series, which were called latent hit series, were identified prospectively in a cell-based screening campaign and also in a series of retrospective analyses of publicly available data sets from PubChem.
View Article and Find Full Text PDFDatabases for small organic chemical molecules usually contain millions of structures. The screening decks of pharmaceutical companies contain more than a million of structures. Nevertheless chemical substructure searching in these databases can be performed interactively in seconds.
View Article and Find Full Text PDFIdentification of meaningful chemical patterns in the increasing amounts of high-throughput-generated bioactivity data available today is an increasingly important challenge for successful drug discovery. Herein, we present the scaffold network as a novel approach for mapping and navigation of chemical and biological space. A scaffold network represents the chemical space of a library of molecules consisting of all molecular scaffolds and smaller "parent" scaffolds generated therefrom by the pruning of rings, effectively leading to a network of common scaffold substructure relationships.
View Article and Find Full Text PDFThe design of a high-quality screening collection is of utmost importance for the early drug-discovery process and provides, in combination with high-quality assay systems, the foundation of future discoveries. Herein, we review recent trends and observations to successfully expand the access to bioactive chemical space, including the feedback from hit assessment interviews of high-throughput screening campaigns; recent successes with chemogenomics target family approaches, the identification of new relevant target/domain families, diversity-oriented synthesis and new emerging compound classes, and non-classical approaches, such as fragment-based screening and DNA-encoded chemical libraries. The role of in silico library design approaches are emphasized.
View Article and Find Full Text PDFSeveral efficient correspondence graph-based algorithms for determining the maximum common substructure (MCS) of a pair of molecules have been published in the literature. The extension of the problem to three or more molecules is however nontrivial; heuristics used to increase the efficiency in the two-molecule case are either inapplicable to the many-molecule case or do not provide significant speedups. Our specific algorithmic contribution is two-fold.
View Article and Find Full Text PDFThe main goal of high-throughput screening (HTS) is to identify active chemical series rather than just individual active compounds. In light of this goal, a new method (called compound set enrichment) to identify active chemical series from primary screening data is proposed. The method employs the scaffold tree compound classification in conjunction with the Kolmogorov-Smirnov statistic to assess the overall activity of a compound scaffold.
View Article and Find Full Text PDFThe Scaffold Tree algorithm (J Chem Inf Model 47:47-58, 2007) allows to organize large molecular data sets by arranging sets of molecules into a unique tree hierarchy based on their scaffolds, with scaffolds forming leaf nodes of such tree. The hierarchy is created by iterative removal of rings from more complex scaffolds using chemically meaningful set of rules, until a single, root ring is obtained. The classification is deterministic, data set independent, and scales linearly with the number of compounds included in the data set.
View Article and Find Full Text PDFThe structure- and chemistry-based hierarchical organization of library scaffolds in tree-like arrangements provides a valid, intuitive means to map and navigate chemical space. We demonstrate that scaffold trees built using bioactivity as the key selection criterion for structural simplification during tree construction allow efficient and intuitive mapping, visualization and navigation of the chemical space defined by a given library, which in turn allows correlation of this chemical space with the investigated bioactivity and further compound design. Brachiation along the branches of such trees from structurally complex to simple scaffolds with retained yet varying bioactivity is feasible at high frequency for the five major pharmaceutically relevant target classes and allows for the identification of new inhibitor types for a given target.
View Article and Find Full Text PDFBackground: A method to estimate ease of synthesis (synthetic accessibility) of drug-like molecules is needed in many areas of the drug discovery process. The development and validation of such a method that is able to characterize molecule synthetic accessibility as a score between 1 (easy to make) and 10 (very difficult to make) is described in this article.
Results: The method for estimation of the synthetic accessibility score (SAscore) described here is based on a combination of fragment contributions and a complexity penalty.
Natural products (NPs) have evolved over a very long natural selection process to form optimal interactions with biological macromolecules. NPs are therefore an extremely useful source of inspiration for the design of new drugs. In the present study we report the results of a cheminformatics analysis of more than 130,000 NP structures.
View Article and Find Full Text PDFNatural products (NPs) have been optimized in a very long natural selection process for optimal interactions with biological macromolecules. NPs are therefore an excellent source of validated substructures for the design of novel bioactive molecules. Various cheminformatics techniques can provide useful help in analyzing NPs, and the results of such studies may be used with advantage in the drug discovery process.
View Article and Find Full Text PDFClassification methods for data sets of molecules according to their chemical structure were evaluated for their biological relevance, including rule-based, scaffold-oriented classification methods and clustering based on molecular descriptors. Three data sets resulting from uniformly determined in vitro biological profiling experiments were classified according to their chemical structures, and the results were compared in a Pareto analysis with the number of classes and their average spread in the profile space as two concurrent objectives which were to be minimized. It has been found that no classification method is overall superior to all other studied methods, but there is a general trend that rule-based, scaffold-oriented methods are the better choice if classes with homogeneous biological activity are required, but a large number of clusters can be tolerated.
View Article and Find Full Text PDF