Classification models are generated to predict in vitro cytogenetic results for a diverse set of 383 organic compounds. Both k-nearest neighbor and support vector machine models are developed. They are based on calculated molecular structure descriptors. Endpoints used are the labels clastogenic or nonclastogenic according to an in vitro chromosomal aberration assay with Chinese hamster lung cells. Compounds that were tested with both a 24 and 48 h exposure are included. Each compound is represented by calculated molecular structure descriptors encoding the topological, electronic, geometrical, or polar surface area aspects of the structure. Subsets of informative descriptors are identified with genetic algorithm feature selection coupled to the appropriate classification algorithm. The overall classification success rate for a k-nearest neighbor classifier built with just six topological descriptors is 81.2% for the training set and 86.5% for an external prediction set. The overall classification success rate for a three-descriptor support vector machine model is 99.7% for the training set, 92.1% for the cross-validation set, and 83.8% for an external prediction set.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1021/tx020077w | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!