Chemoinformatics-based classification of prohibited substances employed for doping in sport.

J Chem Inf Model

Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom.

Published: February 2007

AI Article Synopsis

  • A study analyzed 5245 molecules from the World Anti-Doping Agency list and the MDDR database, including both prohibited and allowed substances.
  • Five types of chemical fingerprints were created for these molecules, and machine learning methods like random forest and k-nearest neighbors (kNN) were employed to classify them based on their features.
  • The kNN approach showed strong performance in identifying prohibited substances, with a higher recall but lower precision, suggesting the potential for developing reliable methods to detect doping agents while safeguarding athletes' rights.

Article Abstract

Representative molecules from 10 classes of prohibited substances were taken from the World Anti-Doping Agency (WADA) list, augmented by molecules from corresponding activity classes found in the MDDR database. Together with some explicitly allowed compounds, these formed a set of 5245 molecules. Five types of fingerprints were calculated for these substances. The random forest classification method was used to predict membership of each prohibited class on the basis of each type of fingerprint, using 5-fold cross-validation. We also used a k-nearest neighbors (kNN) approach, which worked well for the smallest values of k. The most successful classifiers are based on Unity 2D fingerprints and give very similar Matthews correlation coefficients of 0.836 (kNN) and 0.829 (random forest). The kNN classifiers tend to give a higher recall of positives at the expense of lower precision. A naïve Bayesian classifier, however, lies much further toward the extreme of high recall and low precision. Our results suggest that it will be possible to produce a reliable and quantitative assignment of membership or otherwise of each class of prohibited substances. This should aid the fight against the use of bioactive novel compounds as doping agents, while also protecting athletes against unjust disqualification.

Download full-text PDF

Source
http://dx.doi.org/10.1021/ci0601160DOI Listing

Publication Analysis

Top Keywords

prohibited substances
12
random forest
8
chemoinformatics-based classification
4
prohibited
4
classification prohibited
4
substances
4
substances employed
4
employed doping
4
doping sport
4
sport representative
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!