Background: Machine learning (ML) is increasingly used to predict clinical deterioration in intensive care unit (ICU) patients through scoring systems. Although promising, such algorithms often overfit their training cohort and perform worse at new hospitals. Thus, external validation is a critical - but frequently overlooked - step to establish the reliability of predicted risk scores to translate them into clinical practice. We systematically reviewed how regularly external validation of ML-based risk scores is performed and how their performance changed in external data.

Methods: We searched MEDLINE, Web of Science, and arXiv for studies using ML to predict deterioration of ICU patients from routine data. We included primary research published in English before December 2023. We summarised how many studies were externally validated, assessing differences over time, by outcome, and by data source. For validated studies, we evaluated the change in area under the receiver operating characteristic (AUROC) attributable to external validation using linear mixed-effects models.

Results: We included 572 studies, of which 84 (14.7%) were externally validated, increasing to 23.9% by 2023. Validated studies made disproportionate use of open-source data, with two well-known US datasets (MIMIC and eICU) accounting for 83.3% of studies. On average, AUROC was reduced by -0.037 (95% CI -0.052 to -0.027) in external data, with more than 0.05 reduction in 49.5% of studies.

Discussion: External validation, although increasing, remains uncommon. Performance was generally lower in external data, questioning the reliability of some recently proposed ML-based scores. Interpretation of the results was challenged by an overreliance on the same few datasets, implicit differences in case mix, and exclusive use of AUROC.

Download full-text PDF

Source
http://dx.doi.org/10.1186/s12911-024-02830-7DOI Listing

Publication Analysis

Top Keywords

external validation
20
external
8
scoring systems
8
icu patients
8
risk scores
8
externally validated
8
validated studies
8
external data
8
studies
6
data
5

Similar Publications

Development of a bioreactor with an integrated non-dispersive infrared CO sensor for rapid and sensitive detection of Cr(VI) toxicity in water.

J Hazard Mater

January 2025

Institute of Chemical Technology, Vietnam Academy of Science and Technology, 1A TL29 Street, Thanh Loc Ward, District 12, HCM City,  Viet Nam; Graduate University of Science and Technology, Vietnam Academy of Science and Technology, 18 Hoang Quoc Viet Street, Cau Giay District, Hanoi, Viet Nam. Electronic address:

Whole-cell bioreactors equipped with external physico-chemical sensors have gained attention for real-time toxicity monitoring. However, deploying these systems in practice is challenging due to potential interference from unknown wastewater constituents with liquid-contacted sensors. In this study, a novel approach using a bioreactor integrated with a non-dispersive infrared CO₂ sensor for both toxicity detection and real-time monitoring of microbial growth phases was successfully demonstrated.

View Article and Find Full Text PDF

COLOFIT: Development and Internal-External Validation of Models Using Age, Sex, Faecal Immunochemical and Blood Tests to Optimise Diagnosis of Colorectal Cancer in Symptomatic Patients.

Aliment Pharmacol Ther

January 2025

Gastrointestinal and Liver Theme, National Institute for Health Research (NIHR) Nottingham Biomedical Research Centre (BRC), Nottingham University Hospitals NHS Trust and the University of Nottingham, School of Medicine, Queen's Medical Centre, Nottingham, UK.

Background: Colorectal cancer (CRC) is the third most common cancer in the United Kingdom and the second largest cause of cancer death.

Aim: To develop and validate a model using available information at the time of faecal immunochemical testing (FIT) in primary care to improve selection of symptomatic patients for CRC investigations.

Methods: We included all adults (≥ 18 years) referred to Nottingham University Hospitals NHS Trust between 2018 and 2022 with symptoms of suspected CRC who had a FIT.

View Article and Find Full Text PDF

Extension of an ICU-based noninvasive model to predict latent shock in the emergency department: an exploratory study.

Front Cardiovasc Med

December 2024

Emergency Center, Hubei Clinical Research Center for Emergency and Resuscitaion, Zhongnan Hospital of Wuhan University, Wuhan, Hubei, China.

Background: Artificial intelligence (AI) has been widely adopted for the prediction of latent shock occurrence in critically ill patients in intensive care units (ICUs). However, the usefulness of an ICU-based model to predict latent shock risk in an emergency department (ED) setting remains unclear. This study aimed to develop an AI model to predict latent shock risk in patients admitted to EDs.

View Article and Find Full Text PDF

Understanding cellular responses to external stimuli is critical for parsing biological mechanisms and advancing therapeutic development. High-content image-based assays provide a cost-effective approach to examine cellular phenotypes induced by diverse interventions, which offers valuable insights into biological processes and cellular states. In this paper, we introduce MorphoDiff, a generative pipeline to predict high-resolution cell morphological responses under different conditions based on perturbation encoding.

View Article and Find Full Text PDF

Microbes of nearly every species can form biofilms, communities of cells bound together by a self-produced matrix. It is not understood how variation at the cellular level impacts putatively beneficial, colony-level behaviors, such as cell-to-cell signaling. Here we investigate this problem with an agent-based computational model of metabolically driven electrochemical signaling in Bacillus subtilis biofilms.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!