Data mining the NCI60 to predict generalized cytotoxicity.

J Chem Inf Model

Department of Medicinal Chemistry, College of Pharmacy, University of Michigan, Ann Arbor, Michigan 48109, USA.

Published: July 2008

Elimination of cytotoxic compounds in the early and later stages of drug discovery can help reduce the costs of research and development. Through the application of principal components analysis (PCA), we were able to data mine and prove that approximately 89% of the total log GI 50 variance is due to the nonspecific cytotoxic nature of substances. Furthermore, PCA led to the identification of groups of structurally unrelated substances showing very specific toxicity profiles, such as a set of 45 substances toxic only to the Leukemia_SR cancer cell line. In an effort to predict nonspecific cytotoxicity on the basis of the mean log GI 50, we created a decision tree using MACCS keys that can correctly classify over 83% of the substances as cytotoxic/noncytotoxic in silico, on the basis of the cutoff of mean log GI 50 = -5.0. Finally, we have established a linear model using least-squares in which nine of the 59 available NCI60 cancer cell lines can be used to predict the mean log GI 50. The model has R (2) = 0.99 and a root-mean-square deviation between the observed and calculated mean log GI 50 (RMSE) = 0.09. Our predictive models can be applied to flag generally cytotoxic molecules in virtual and real chemical libraries, thus saving time and effort.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2561991PMC
http://dx.doi.org/10.1021/ci800097kDOI Listing

Publication Analysis

Top Keywords

cancer cell
8
log
5
data mining
4
mining nci60
4
nci60 predict
4
predict generalized
4
generalized cytotoxicity
4
cytotoxicity elimination
4
elimination cytotoxic
4
cytotoxic compounds
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!