Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection.

Mol Divers

The Laboratory for Molecular Modeling, School of Pharmacy, University of North Carolina, Chapel Hill, NC 27599-7360, USA.

Published: June 2003

One of the most important characteristics of Quantitative Structure Activity Relashionships (QSAR) models is their predictive power. The latter can be defined as the ability of a model to predict accurately the target property (e.g., biological activity) of compounds that were not used for model development. We suggest that this goal can be achieved by rational division of an experimental SAR dataset into the training and test set, which are used for model development and validation, respectively. Given that all compounds are represented by points in multidimensional descriptor space, we argue that training and test sets must satisfy the following criteria: (i) Representative points of the test set must be close to those of the training set; (ii) Representative points of the training set must be close to representative points of the test set; (iii) Training set must be diverse. For quantitative description of these criteria, we use molecular dataset diversity indices introduced recently (Golbraikh, A., J. Chem. Inf. Comput. Sci., 40 (2000) 414-425). For rational division of a dataset into the training and test sets, we use three closely related sphere-exclusion algorithms. Using several experimental datasets, we demonstrate that QSAR models built and validated with our approach have statistically better predictive power than models generated with either random or activity ranking based selection of the training and test sets. We suggest that rational approaches to the selection of training and test sets based on diversity principles should be used routinely in all QSAR modeling research.

Download full-text PDF

Source
http://dx.doi.org/10.1023/a:1021372108686DOI Listing

Publication Analysis

Top Keywords

training test
24
test set
16
test sets
16
representative points
12
training set
12
training
9
qsar modeling
8
based diversity
8
experimental datasets
8
test
8

Similar Publications

Brucella spp. is the bacterium responsible for brucellosis, a zoonotic infection that affects humans. This disease poses significant health challenges and contributes to poverty, particularly in developing countries.

View Article and Find Full Text PDF

The MADS-RIPENING INHIBITOR-DIVARICATA1 module regulates carotenoid biosynthesis in nonclimacteric Capsicum fruits.

Plant Physiol

January 2025

Key Laboratory for Vegetable Biology of Hunan Province, Engineering Research Center for Horticultural Crop Germplasm Creation and New Variety Breeding, Ministry of Education, College of Horticulture, Hunan Agricultural University, Changsha 410125, China.

Carotenoids play indispensable roles in the ripening process of fleshy fruits. Capsanthin is a widely distributed and utilized natural red carotenoid. However, the regulatory genes involved in capsanthin biosynthesis remain insufficient.

View Article and Find Full Text PDF

Deep Neural Network Analysis of the 12-Lead Electrocardiogram Distinguishes Patients With Congenital Long QT Syndrome From Patients With Acquired QT Prolongation.

Mayo Clin Proc

January 2025

Division of Pediatric Cardiology, Department of Pediatric and Adolescent Medicine, Mayo Clinic, Rochester, MN; Department of Molecular Pharmacology and Experimental Therapeutics, Windland Smith Rice Sudden Death Genomics Laboratory, Mayo Clinic, Rochester, MN; Division of Heart Rhythm Services, Department of Cardiovascular Medicine, Windland Smith Rice Genetic Heart Rhythm Clinic, Mayo Clinic, Rochester, MN. Electronic address:

Objective: To test whether an artificial intelligence (AI) deep neural network (DNN)-derived analysis of the 12-lead electrocardiogram (ECG) can distinguish patients with long QT syndrome (LQTS) from those with acquired QT prolongation.

Methods: The study cohort included all patients with genetically confirmed LQTS evaluated in the Windland Smith Rice Genetic Heart Rhythm Clinic and controls from Mayo Clinic's ECG data vault comprising more than 2.5 million patients.

View Article and Find Full Text PDF

Dietary Salt-Related Knowledge, Attitudes, and Behaviors of New Zealand Adults Aged 18-65 Years.

J Nutr Educ Behav

January 2025

Department of Epidemiology and Biostatistics, School of Population Health, The University of Auckland, Auckland, New Zealand; Centre for Translational Health Research: Informing Policy and Practice, School of Population Health, The University of Auckland, Auckland, New Zealand.

Objective: To explore dietary salt-related knowledge, attitudes, and behaviors of New Zealand (NZ) adults aged 18-65 years and assess differences by demographic subgroups.

Design: Cross-sectional online survey conducted between June 1, 2018 and August 31, 2018.

Setting: Participants were recruited in shopping malls, via social media, and a market research panel.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!