Comparison of logP and logD correction models trained with public and proprietary data sets.

Ignacio Aliagas Alberto Gobbi Man-Ling Lee Benjamin D Sellers

J Comput Aided Mol Des

Discovery Chemistry, Genentech Inc, 1 DNA Way, South San Francisco, CA, 94080, USA.

Published: March 2022

In drug discovery, partition and distribution coefficients, logP and logD for octanol/water, are widely used as metrics of the lipophilicity of molecules, which in turn have a strong influence on the bioactivity and bioavailability of potential drugs. There are a variety of established methods, mostly fragment or atom-based, to calculate logP while logD prediction generally relies on calculated logP and pKa for the estimation of neutral and ionized populations at a given pH. Algorithms such as ClogP have limitations generally leading to systematic errors for chemically related molecules while pKa estimation is generally more difficult due to the interplay of electronic, inductive and conjugation effects for ionizable moieties. We propose an integrated machine learning QSAR modeling approach to predict logD by training the model with experimental data while using ClogP and pKa predicted by commercial software as model descriptors. By optimizing the loss function for the ClogD calculated by the software, we build a correction model that incorporates both descriptors from the software and available experimental logD data. Additionally, we calculate logP from the logD model using the software predicted pKa's. Here, we have trained models using publicly or commercial available logD data to show that this approach can improve on commercial software predictions of lipophilicity. When applied to other logD data sets, this approach extends the domain of applicability of logD and logP predictions over commercial software. Performance of these models favorably compare with models built with a larger set of proprietary logD data.

Download full-text PDF	Source
http://dx.doi.org/10.1007/s10822-022-00450-9	DOI Listing

Publication Analysis

Top Keywords

logp logd

logd data

commercial software

logd

data sets

calculate logp

pka estimation

data

software

logp

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!