Background: New methods such as machine learning techniques have been increasingly used to enhance the performance of risk predictions for clinical decision-making. However, commonly reported performance metrics may not be sufficient to capture the advantages of these newly proposed models for their adoption by health care professionals to improve care. Machine learning models often improve risk estimation for certain subpopulations that may be missed by these metrics.
Methods And Results: This article addresses the limitations of commonly reported metrics for performance comparison and proposes additional metrics. Our discussions cover metrics related to overall performance, discrimination, calibration, resolution, reclassification, and model implementation. Models for predicting acute kidney injury after percutaneous coronary intervention are used to illustrate the use of these metrics.
Conclusions: We demonstrate that commonly reported metrics may not have sufficient sensitivity to identify improvement of machine learning models and propose the use of a comprehensive list of performance metrics for reporting and comparing clinical risk prediction models.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1161/CIRCOUTCOMES.120.007526 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!