Objectives: The objective of the study was to examine an approach for selecting small sets of diagnosis codes with high prediction performance in large datasets of electronic medical records.
Study Design And Setting: This was a modeling study using national hospital and mortality records for patients with myocardial infarction (n = 200,119), hip fracture (n = 169,646), or colorectal cancer surgery (n = 56,515) in England in 2015-2017. One-year mortality was predicted from ICD-10 codes recorded for at least 0.5% of patients using logistic regression ('full' models). An approximation method was used to select fewer codes that explained at least 95% of variation in full model predictions ('reduced' models).
Results: One-year mortality was 17.2% (34,520) after myocardial infarction, 27.2% (46,115) after hip fracture, and 9.3% (5,273) after colorectal surgery. Full models included 202, 257, and 209 ICD-10 codes in these populations. C-statistics for these models were 0.884 (95% confidence interval (CI) 0.882, 0.886), 0.798 (0.795, 0.800), and 0.810 (0.804, 0.817). Reduced models included 18, 33, and 41 codes and had c-statistics of 0.874 (95% CI 0.872, 0.876), 0.791 (0.788, 0.793), and 0.807 (0.801, 0.813). Performance was also similar when measured using Brier scores. All models were well calibrated.
Conclusion: Our approach selected small sets of diagnosis codes that predicted patient outcomes comparably to large, comprehensive sets of codes.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.jclinepi.2020.08.001 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!