Large-scale evaluation of k-fold cross-validation ensembles for uncertainty estimation.

J Cheminform

Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, Beethovenstrasse 55, 38106, Brunswick, Germany.

Published: April 2023

It is insightful to report an estimator that describes how certain a model is in a prediction, additionally to the prediction alone. For regression tasks, most approaches implement a variation of the ensemble method, apart from few exceptions. Instead of a single estimator, a group of estimators yields several predictions for an input. The uncertainty can then be quantified by measuring the disagreement between the predictions, for example by the standard deviation. In theory, ensembles should not only provide uncertainties, they also boost the predictive performance by reducing errors arising from variance. Despite the development of novel methods, they are still considered the "golden-standard" to quantify the uncertainty of regression models. Subsampling-based methods to obtain ensembles can be applied to all models, regardless whether they are related to deep learning or traditional machine learning. However, little attention has been given to the question whether the ensemble method is applicable to virtually all scenarios occurring in the field of cheminformatics. In a widespread and diversified attempt, ensembles are evaluated for 32 datasets of different sizes and modeling difficulty, ranging from physicochemical properties to biological activities. For increasing ensemble sizes with up to 200 members, the predictive performance as well as the applicability as uncertainty estimator are shown for all combinations of five modeling techniques and four molecular featurizations. Useful recommendations were derived for practitioners regarding the success and minimum size of ensembles, depending on whether predictive performance or uncertainty quantification is of more importance for the task at hand.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10142532PMC
http://dx.doi.org/10.1186/s13321-023-00709-9DOI Listing

Publication Analysis

Top Keywords

predictive performance
12
ensemble method
8
ensembles
5
uncertainty
5
large-scale evaluation
4
evaluation k-fold
4
k-fold cross-validation
4
cross-validation ensembles
4
ensembles uncertainty
4
uncertainty estimation
4

Similar Publications

Background: Artificial sweeteners (AS) have been widely utilized in the food, beverage, and pharmaceutical industries for decades. While numerous publications have suggested a potential link between AS and diseases, particularly cancer, controversy still surrounds this issue. This study aims to investigate the association between AS consumption and cancer risk.

View Article and Find Full Text PDF

Background: Near-infrared spectroscopy (NIRS) enables a non-invasive measurement of tissue oxygen saturation (StO) in regions illuminated by near-infrared lights. Vascular occlusion test (VOT) serves as a model to artificially induce forearm ischemia-reperfusion. The combination of StO monitoring and VOT allows for dynamic evaluation of the balance between oxygen delivery and consumption in tissue, as well as the functional reserve of microcirculation.

View Article and Find Full Text PDF

Impact of remnant cholesterol on short-term and long-term prognosis in patients with prediabetes or diabetes undergoing coronary artery bypass grafting: a large-scale cohort study.

Cardiovasc Diabetol

January 2025

State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, No. 167 Beilishi Rd, Xicheng District, Beijing, 100037, People's Republic of China.

Background: Remnant cholesterol (remnant-C) contributes to atherosclerotic cardiovascular disease (ASCVD), particularly in individuals with impaired glucose metabolism. Patients with impaired glucose metabolism and ASCVD remain at significant residual risk after coronary artery bypass grafting (CABG). However, the role of remnant-C in this population has not yet been investigated.

View Article and Find Full Text PDF

Background: Chronic obstructive pulmonary disease (COPD) is a chronic and progressive lung disease. Disulfidptosis-related genes (DRGs) may be involved in the pathogenesis of COPD. From the perspective of predictive, preventive, and personalized medicine (PPPM), clarifying the role of disulfidptosis in the development of COPD could provide a opportunity for primary prediction, targeted prevention, and personalized treatment of the disease.

View Article and Find Full Text PDF

Development of a prognostic nomogram and risk stratification system for elderly patients with esophageal squamous cell carcinoma undergoing definitive radiotherapy: a multicenter retrospective analysis (3JECROG R-03 A).

BMC Cancer

January 2025

Department of Radiation Oncology, Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, No. 420, Fuma Road, Jinan District, Fuzhou City, Fujian Province, People's Republic of China.

Background: Our goal is to develop a nomogram model to predict overall survival (OS) for elderly esophageal squamous cell carcinoma (ESCC) patients receiving definitive radiotherapy (RT) or concurrent chemoradiotherapy (CRT), aiding clinicians in personalized treatment planning with a risk stratification system.

Methods: A retrospective study was conducted on 718 elderly ESCC patients treated with RT or CRT at 10 medical centers (3JECROG) from January 2004 to November 2016. We identified independent prognostic factors using univariate and multifactorial Cox regression to construct a nomogram model.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!