Development of binary classification of structural chromosome aberrations for a diverse set of organic compounds from molecular structure.

Chem Res Toxicol

The Pennsylvania State University, 152 Davey Laboratory, Chemistry Department, University Park, Pennsylvania 16802, USA.

Published: February 2003

Classification models are generated to predict in vitro cytogenetic results for a diverse set of 383 organic compounds. Both k-nearest neighbor and support vector machine models are developed. They are based on calculated molecular structure descriptors. Endpoints used are the labels clastogenic or nonclastogenic according to an in vitro chromosomal aberration assay with Chinese hamster lung cells. Compounds that were tested with both a 24 and 48 h exposure are included. Each compound is represented by calculated molecular structure descriptors encoding the topological, electronic, geometrical, or polar surface area aspects of the structure. Subsets of informative descriptors are identified with genetic algorithm feature selection coupled to the appropriate classification algorithm. The overall classification success rate for a k-nearest neighbor classifier built with just six topological descriptors is 81.2% for the training set and 86.5% for an external prediction set. The overall classification success rate for a three-descriptor support vector machine model is 99.7% for the training set, 92.1% for the cross-validation set, and 83.8% for an external prediction set.

Download full-text PDF	Source
http://dx.doi.org/10.1021/tx020077w	DOI Listing

Publication Analysis

Top Keywords

molecular structure

diverse set

organic compounds

k-nearest neighbor

support vector

vector machine

calculated molecular

structure descriptors

classification success

success rate

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!