Background: Machine learning (ML) based mortality prediction models can be immensely useful in intensive care units. Such a model should generate warnings to alert physicians when a patient's condition rapidly deteriorates, or their vitals are in highly abnormal ranges. Before clinical deployment, it is important to comprehensively assess a model's ability to recognize critical patient conditions.

Methods: We develop multiple medical ML testing approaches, including a gradient ascent method and neural activation map. We systematically assess these machine learning models' ability to respond to serious medical conditions using additional test cases, some of which are time series. Guided by medical doctors, our evaluation involves multiple machine learning models, resampling techniques, and four datasets for two clinical prediction tasks.

Results: We identify serious deficiencies in the models' responsiveness, with the models being unable to recognize severely impaired medical conditions or rapidly deteriorating health. For in-hospital mortality prediction, the models tested using our synthesized cases fail to recognize 66% of the injuries. In some instances, the models fail to generate adequate mortality risk scores for all test cases. Our study identifies similar kinds of deficiencies in the responsiveness of 5-year breast and lung cancer prediction models.

Conclusions: Using generated test cases, we find that statistical machine-learning models trained solely from patient data are grossly insufficient and have many dangerous blind spots. Most of the ML models tested fail to respond adequately to critically ill patients. How to incorporate medical knowledge into clinical machine learning models is an important future research direction.

Download full-text PDF

Source
http://dx.doi.org/10.1038/s43856-025-00775-0DOI Listing

Publication Analysis

Top Keywords

machine learning
20
learning models
12
test cases
12
models
9
deteriorating health
8
mortality prediction
8
prediction models
8
medical conditions
8
models tested
8
machine
5

Similar Publications

Background: Processing data from electronic health records (EHRs) to build research-grade databases is a lengthy and expensive process. Modern arthroplasty practice commonly uses multiple sites of care, including clinics and ambulatory care centers. However, most private data systems prevent obtaining usable insights for clinical practice.

View Article and Find Full Text PDF

Background: Amyotrophic lateral sclerosis (ALS) leads to rapid physiological and functional decline before causing untimely death. Current best-practice approaches to interdisciplinary care are unable to provide adequate monitoring of patients' health. Passive in-home sensor systems enable 24×7 health monitoring.

View Article and Find Full Text PDF

AI-Driven Discovery of Highly Specific and Efficacious hCES2A Inhibitors for Ameliorating Irinotecan-Triggered Gut Toxicity.

J Med Chem

March 2025

State Key Laboratory of Discovery and Utilization of Functional Components in Traditional Chinese Medicine; Shanghai Frontiers Science Center of TCM Chemical Biology; Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai 201203, China.

The anticancer agent irinotecan often induces severe delayed-onset diarrhea, inhibiting human carboxylesterase 2A (hCES2A) can significantly alleviate irinotecan-triggered gut toxicity (ITGT). This work presents an efficient workflow for design and developing novel efficacious hCES2A inhibitors. A well-training machine learning model identified as a lead compound, while compound was developed as a novel time-dependent hCES2A inhibitor (IC = 0.

View Article and Find Full Text PDF

Within a recent decade, graph neural network (GNN) has emerged as a powerful neural architecture for various graph-structured data modelling and task-driven representation learning problems. Recent studies have highlighted the remarkable capabilities of GNNs in handling complex graph representation learning tasks, achieving state-of-the-art results in node/graph classification, regression, and generation. However, most traditional GNN-based architectures like GCN and GraphSAGE still faced several challenges related to the capability of preserving the multi-scaled topological structures.

View Article and Find Full Text PDF

Background: Plant-based milk alternatives (PBMA) are increasingly popular due to rising lactose intolerance and environmental concerns over traditional dairy products. However, limited efforts have been made to develop rapid authentication methods to verify their biological origin.

Objective: In this study, we developed a rapid, on-site analytical method for the authentication and identification of PBMA made by six different plant species utilizing a portable Raman spectrometer coupled with machine learning.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!