Pruned Machine Learning Models to Predict Aqueous Solubility.

ACS Omega

Department of Pharmacology, Physiology, and Neuroscience, Rutgers University-New Jersey Medical School, Newark, New Jersey 07103, United States.

Published: July 2020

Solubility is a key metric for therapeutic compounds. Conversely, insoluble compounds cloud the accuracy of assays at all stages of chemical biology and drug discovery. Herein, we disclose naïve Bayesian classifier models to predict aqueous solubility. Publicly accessible aqueous solubility data were used to create two full, or nonpruned, training sets. These two sets were also combined to create a full fused set, and a training set comprised of a literature collation of solubility data was also considered as a reference. We tested different extents of data pruning on the training sets and constructed machine learning models that were evaluated with two independent, external test sets that contained compounds that were different from the training sets. The best pruned and fused model was significantly more accurate, in comparison to either the full model or the full fused model, with the prediction of these external test sets. By carefully removing data from the training set, less information can be used to create more accurate machine learning models for aqueous solubility. This knowledge and the curated training sets should prove useful to future machine learning approaches.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7364544PMC
http://dx.doi.org/10.1021/acsomega.0c01251DOI Listing

Publication Analysis

Top Keywords

machine learning
16
aqueous solubility
16
training sets
16
learning models
12
models predict
8
predict aqueous
8
solubility data
8
create full
8
full fused
8
training set
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!