Machine learning (ML) has become an indispensable tool to predict absorption, distribution, metabolism, and excretion (ADME) properties in pharmaceutical research. ML algorithms are trained on molecular structures and corresponding ADME assay data to develop quantitative structure-property relationship (QSPR) models. Traditional QSPR models were trained on compound sets of limited size. With the advent of more complex ML algorithms and data availability, training sets have become larger and more diverse. Most common training approaches consist in either training a model with a small set of similar compounds, namely, compounds designed for the same drug discovery project or chemical series ( approach) or with a larger set of diverse compounds ( approach). Global models are built with all experimental data available for an assay, combining compound data from different projects and disease areas. Despite the ML progress made so far, the choice of the appropriate data composition for building ML models is still unclear. Herein, a systematic evaluation of local and global ML models was performed for 10 different experimental assays and 112 drug discovery projects. Results show a consistent superior performance of global models for ADME property predictions. Diagnostic analyses were also carried out to investigate the influence of training set size, structural diversity, and data shift in the relative performance of local and global ML models. Training set and structural diversity did not have an impact in the relative performance on the methods. Instead, data shift helped to identify the projects with larger performance differences between local and global models. Results presented in this work can be leveraged to improve ML-based ADME properties predictions and thus decision-making in drug discovery projects.

Download full-text PDF

Source
http://dx.doi.org/10.1021/acs.molpharmaceut.2c00962DOI Listing

Publication Analysis

Top Keywords

global models
20
local global
16
adme properties
12
drug discovery
12
models
9
systematic evaluation
8
evaluation local
8
machine learning
8
qspr models
8
discovery projects
8

Similar Publications

Avian pathogenic Escherichia coli (APEC) is a significant pathogen infecting poultry that is responsible for high mortality, morbidity and severe economic losses to the poultry industry globally, posing a substantial risk to the health of poultry. APEC encounters reactive oxygen species (ROS) during the infection process and thus has evolved antioxidant defense mechanisms to protect against oxidative damage. The imbalance of ROS production and antioxidant defenses is known as oxidative stress, which results in oxidative damage to proteins, lipids and DNA, and even bacterial cell death.

View Article and Find Full Text PDF

Bias in machine learning applications to address non-communicable diseases at a population-level: a scoping review.

BMC Public Health

December 2024

Upstream Lab, MAP Centre for Urban Health Solutions, Li Ka Shing Knowledge Institute, Unity Health Toronto, 30 Bond Street, Toronto, ON, M5B 1W8, Canada.

Background: Machine learning (ML) is increasingly used in population and public health to support epidemiological studies, surveillance, and evaluation. Our objective was to conduct a scoping review to identify studies that use ML in population health, with a focus on its use in non-communicable diseases (NCDs). We also examine potential algorithmic biases in model design, training, and implementation, as well as efforts to mitigate these biases.

View Article and Find Full Text PDF

Background: To compare the effectiveness of four surveillance strategies for detecting SARS-CoV-2 within the homeless shelter population in Hamilton, ON and assess participant adherence over time for each surveillance method.

Methods: This was an open-label, cluster-randomized controlled trial conducted in eleven homeless shelters in Hamilton, Ontario, from April 2020 to January 2021. All participants who consented to the study and participated in the surveillance were eligible for testing by self-swabbing.

View Article and Find Full Text PDF

Long non-coding RNAs (lncRNAs) play crucial roles in numerous biological processes and are involved in complex human diseases through interactions with proteins. Accurate identification of lncRNA-protein interactions (LPI) can help elucidate the functional mechanisms of lncRNAs and provide scientific insights into the molecular mechanisms underlying related diseases. While many sequence-based methods have been developed to predict LPIs, efficiently extracting and effectively integrating potential feature information that reflects functional attributes from lncRNA and protein sequences remains a significant challenge.

View Article and Find Full Text PDF

Background: Tuberculosis (TB) remains a significant global health issue. Drug-resistant TB and comorbidities exacerbate its burden, influencing treatment outcomes and healthcare utilization. Despite the growing prevalence of TB comorbidities, research often focuses on single comorbidities rather than comorbidity patterns.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!