AI Article Synopsis

  • Machine learning algorithms are often touted as superior to logistic regression in clinical settings, particularly for binary prediction tasks, but relying solely on metrics like the area under the receiver operating characteristic curve can be misleading.
  • Predictions of rare post-surgery complications, such as mortality after aortic valve replacement, were evaluated using various algorithms, revealing high accuracy but low true positive rates.
  • The study emphasizes that clinical research should consider multiple evaluation metrics rather than focusing solely on the area under the receiver operating characteristic curve for a comprehensive assessment of model performance.

Article Abstract

Machine learning algorithms are increasingly used in the clinical literature, claiming advantages over logistic regression. However, they are generally designed to maximize the area under the receiver operating characteristic curve. While area under the receiver operating characteristic curve and other measures of accuracy are commonly reported for evaluating binary prediction problems, these metrics can be misleading. We aim to give clinical and machine learning researchers a realistic medical example of the dangers of relying on a single measure of discriminatory performance to evaluate binary prediction questions. Prediction of medical complications after surgery is a frequent but challenging task because many post-surgery outcomes are rare. We predicted post-surgery mortality among patients in a clinical registry who received at least one aortic valve replacement. Estimation incorporated multiple evaluation metrics and algorithms typically regarded as performing well with rare outcomes, as well as an ensemble and a new extension of the lasso for multiple unordered treatments. Results demonstrated high accuracy for all algorithms with moderate measures of cross-validated area under the receiver operating characteristic curve. False positive rates were 1%, however, true positive rates were 7%, even when paired with a 100% positive predictive value, and graphical representations of calibration were poor. Similar results were seen in simulations, with the addition of high area under the receiver operating characteristic curve (90%) accompanying low true positive rates. Clinical studies should not primarily report only area under the receiver operating characteristic curve or accuracy.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8561661PMC
http://dx.doi.org/10.1177/09622802211038754DOI Listing

Publication Analysis

Top Keywords

area receiver
20
receiver operating
20
operating characteristic
20
characteristic curve
20
positive rates
12
rare outcomes
8
machine learning
8
binary prediction
8
true positive
8
area
5

Similar Publications

Prediction of pulmonary embolism by an explainable machine learning approach in the real world.

Sci Rep

January 2025

Department of Respiratory and Critical Care Medicine, Changhai Hospital, The Second Military Medical University, Shanghai, People's Republic of China.

In recent years, large amounts of researches showed that pulmonary embolism (PE) has become a common disease, and PE remains a clinical challenge because of its high mortality, high disability, high missed and high misdiagnosed rates. To address this, we employed an artificial intelligence-based machine learning algorithm (MLA) to construct a robust predictive model for PE. We retrospectively analyzed 1480 suspected PE patients hospitalized in West China Hospital of Sichuan University between May 2015 and April 2020.

View Article and Find Full Text PDF

Objectives: To assess whether the Quantra-Qplus can provide the cutoff values for predicting transfusion thresholds after cardiopulmonary bypass.

Design: Prospective observational study.

Setting: Single-center university hospital.

View Article and Find Full Text PDF

DOME: Directional medical embedding vectors from electronic health records.

J Biomed Inform

January 2025

Harvard Medical School, Boston, MA, USA; VA Boston Healthcare System, Boston, MA, USA; Harvard T.H. Chan School of Public Health, Boston, MA, USA. Electronic address:

Motivation: The increasing availability of electronic health record (EHR) systems has created enormous potential for translational research. Recent developments in representation learning techniques have led to effective large-scale representations of EHR concepts along with knowledge graphs that empower downstream EHR studies. However, most existing methods require training with patient-level data, limiting their abilities to expand the training with multi-institutional EHR data.

View Article and Find Full Text PDF

Purpose: This study aims to evaluate the prognostic value of contrast-enhanced ultrasound (CEUS) combined with tumour markers in patients with hepatocellular carcinoma (HCC) undergoing microwave ablation (MWA).

Methods: MWA patients with HCC were divided into good prognosis (n = 75) and poor prognosis (n = 69) groups. The levels of alpha-fetoprotein (AFP), carbohydrate antigen (CA19-9), and carcinoembryonic antigen (CEA) before and after MWA were analysed using an independent sample t-test.

View Article and Find Full Text PDF

Background/aims: Early warning scores are simple scores obtained by measuring physiological parameters and have been regarded as useful tools for detecting clinical deterioration. This study aimed to evaluate the impact of early warning scores in predicting in-hospital mortality in critically ill patients readmitted to the surgical intensive care unit.

Methods: The study was conducted at a tertiary referral teaching hospital in South Korea.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!