Background: New methods such as machine learning techniques have been increasingly used to enhance the performance of risk predictions for clinical decision-making. However, commonly reported performance metrics may not be sufficient to capture the advantages of these newly proposed models for their adoption by health care professionals to improve care. Machine learning models often improve risk estimation for certain subpopulations that may be missed by these metrics.

Methods And Results: This article addresses the limitations of commonly reported metrics for performance comparison and proposes additional metrics. Our discussions cover metrics related to overall performance, discrimination, calibration, resolution, reclassification, and model implementation. Models for predicting acute kidney injury after percutaneous coronary intervention are used to illustrate the use of these metrics.

Conclusions: We demonstrate that commonly reported metrics may not have sufficient sensitivity to identify improvement of machine learning models and propose the use of a comprehensive list of performance metrics for reporting and comparing clinical risk prediction models.

Download full-text PDF

Source
http://dx.doi.org/10.1161/CIRCOUTCOMES.120.007526DOI Listing

Publication Analysis

Top Keywords

machine learning
16
performance metrics
12
commonly reported
12
clinical risk
8
risk prediction
8
prediction models
8
metrics sufficient
8
learning models
8
reported metrics
8
metrics performance
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!