Exploring the Potential of Adaptive, Local Machine Learning in Comparison to the Prediction Performance of Global Models: A Case Study from Bayer's Caco-2 Permeability Database.

J Chem Inf Model

Preclinical Modeling and Simulation, Preclinical Development, Bayer AG, Muellerstr. 178, 13353 Berlin, Germany.

Published: December 2024

Machine learning (ML) techniques are being widely implemented to fill the gap in simple molecular design guidelines for newer therapeutic modalities in the extended and beyond rule of five chemical space (eRo5, bRo5). These ML techniques predict molecular properties directly from the structure, allowing for the prioritization of promising compounds. However, the performance of models varies greatly among ML use cases. A molecular property for which achieving sufficient performance in generalizing global models still remains difficult is Caco-2 permeability. Especially within the lower permeability ranges, which are specific for larger molecules belonging to the e/bRo5 space, accurate regression predictions have proven to be challenging. The present study, therefore, identifies a suitable combination of ML algorithm and descriptors, consisting of the LightGBM algorithm and RDKit molecular property descriptors, to predict Caco-2 permeability very efficiently by a simple global model. An additionally introduced local model uses the same algorithm and descriptors but selects its training data based on Tanimoto fingerprint similarity to match the individual test compound's structure. Evaluation of this adaptive model, by systematically varying the number of most similar structures for training, shows that, in comparison to the global model, there was only marginally improved performance with specific training data constellations. These random improvements indicate that deriving general rules for local model parametrization is not possible for the chosen algorithm and descriptor combination, and preselecting training data does not seem advantageous over global ML based on all available data, while creation of more data-efficient models was generally proven to be possible.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11683871PMC
http://dx.doi.org/10.1021/acs.jcim.4c01083DOI Listing

Publication Analysis

Top Keywords

caco-2 permeability
12
training data
12
machine learning
8
global models
8
molecular property
8
algorithm descriptors
8
global model
8
local model
8
global
5
model
5

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!