Modern high-throughput screening (HTS) is a well-established approach for hit finding in drug discovery that is routinely employed in the pharmaceutical industry to screen more than a million compounds within a few weeks. However, as the industry shifts to more disease-relevant but more complex phenotypic screens, the focus has moved to piloting smaller but smarter chemically/biologically diverse subsets followed by an expansion around hit compounds. One standard method for doing this is to train a machine-learning (ML) model with the chemical fingerprints of the tested subset of molecules and then select the next compounds based on the predictions of this model. An alternative approach would be to take advantage of the wealth of bioactivity information contained in older (full-deck) screens using so-called HTS fingerprints, where each element of the fingerprint corresponds to the outcome of a particular assay, as input to machine-learning algorithms. We constructed HTS fingerprints using two collections of data: 93 in-house assays and 95 publicly available assays from PubChem. For each source, an additional set of 51 and 46 assays, respectively, was collected for testing. Three different ML methods, random forest (RF), logistic regression (LR), and naïve Bayes (NB), were investigated for both the HTS fingerprint and a chemical fingerprint, Morgan2. RF was found to be best suited for learning from HTS fingerprints yielding area under the receiver operating characteristic curve (AUC) values >0.8 for 78% of the internal assays and enrichment factors at 5% (EF(5%)) >10 for 55% of the assays. The RF(HTS-fp) generally outperformed the LR trained with Morgan2, which was the best ML method for the chemical fingerprint, for the majority of assays. In addition, HTS fingerprints were found to retrieve more diverse chemotypes. Combining the two models through heterogeneous classifier fusion led to a similar or better performance than the best individual model for all assays. Further validation using a pair of in-house assays and data from a confirmatory screen--including a prospective set of around 2000 compounds selected based on our approach--confirmed the good performance. Thus, the combination of machine-learning with HTS fingerprints and chemical fingerprints utilizes information from both domains and presents a very promising approach for hit expansion, leading to more hits. The source code used with the public data is provided.

Download full-text PDF

Source
http://dx.doi.org/10.1021/ci500190pDOI Listing

Publication Analysis

Top Keywords

hts fingerprints
20
approach hit
8
chemical fingerprints
8
assays
8
in-house assays
8
chemical fingerprint
8
morgan2 best
8
hts
7
fingerprints
7
compounds
5

Similar Publications

Development of a high throughput cytochrome P450 ligand-binding assay.

J Biol Chem

October 2024

Department of Pharmacology, University of Michigan, Ann Arbor, Michigan, USA; Departments of Medicinal Chemistry, and Biological Chemistry and the Programs in Chemical Biology and Biophysics, University of Michigan, Ann Arbor, Michigan, USA. Electronic address:

Article Synopsis
  • * Traditional methods for studying these interactions are time-consuming and resource-intensive, but a new semi-automated high-throughput screening assay allows for significantly faster testing with less protein and solvents, validated for different substrate interactions.
  • * The assay was used to screen a library of imidazole compounds across three different cytochrome P450 enzymes, yielding unique binding profiles that can help develop pharmacophores and identify potential inhibitors or new drug candidates while enhancing computational predictions.
View Article and Find Full Text PDF

Computational modeling has emerged as a time-saving and cost-effective alternative to traditional animal testing for assessing chemicals for their potential hazards. However, few computational modeling studies for immunotoxicity were reported, with few models available for predicting toxicants due to the lack of training data and the complex mechanisms of immunotoxicity. In this study, we employed a data-driven quantitative structure-activity relationship (QSAR) modeling workflow to extensively enlarge the limited training data by revealing multiple targets involved in immunotoxicity.

View Article and Find Full Text PDF

Dysregulation of vascular endothelial growth factor (VEGF) and its receptor (VEGFR) contributes to atherosclerosis and cardiovascular disease (CVD), making it a potential target for CVD risk assessment. High-throughput screening (HTS) approaches have resulted in large-scale in vitro data, providing mechanistic information that can help assess chemical toxicity and identify molecular ini­tiating events (MIEs) of adverse outcome pathways (AOPs). AOPs represent a logical sequence of biological responses contributing to toxicity and are valuable tools to inform chemical risk assessment.

View Article and Find Full Text PDF

The human skin virome, unlike commensal bacteria, is an under investigated component of the human skin microbiome. We developed a sensitive, quantitative assay to detect cutaneous human resident papillomaviruses (HPV) and polyomaviruses (HPyV) and we first used it to describe these viral populations at the skin surface of two patients with atopic dermatitis (AD) and psoriasis (PSO). We performed skin swabs on lesional and non-lesional skin in one AD and one PSO patient at M0, M1 and M3.

View Article and Find Full Text PDF

Structure-Based Multilevel Descriptors for High-throughput Screening of Elastomers.

J Phys Chem B

November 2023

School of Materials Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore 639798, Singapore.

To discover new materials, high-throughput screening (HTS) with machine learning (ML) requires universally available descriptors that can accurately predict the desired properties. For elastomers, experimental and simulation data in current descriptors may not be available for all candidates of interest, hindering elastomer discovery through HTS. To address this challenge, we introduce structure-based multilevel (SM) descriptors of elastomers derived solely from molecular structure that is universally available.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!