Prediction of aqueous solubility of organic molecules by binary QSAR was used as a test case for a recently introduced entropy-based descriptor selection method. Property descriptors suitable for solubility predictions were exclusively selected on the basis of Shannon entropy calculations in molecular learning sets, not taking any other information into account. Sets of only five or 10 2D descriptors with largest entropy differences between molecules above or below a defined solubility threshold yielded consistently high prediction accuracy between 80% and 90% in binary QSAR calculations, regardless of the threshold values applied. The top five descriptors with largest differential Shannon entropy (DSE) values achieved an average prediction accuracy of 88%. These findings suggest that differences in entropy and relative information content of descriptors in compared compound data sets correlate with significant differences in physical properties and support the practical relevance of entropy-based descriptor selection routines. The study also demonstrates that binary QSAR methodology can be effectively used to classify small molecules according to aqueous solubility.

Download full-text PDF

Source
http://dx.doi.org/10.1021/ci010243qDOI Listing

Publication Analysis

Top Keywords

binary qsar
16
shannon entropy
12
aqueous solubility
12
differential shannon
8
property descriptors
8
qsar calculations
8
entropy-based descriptor
8
descriptor selection
8
descriptors largest
8
prediction accuracy
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!