A novel approach selected small sets of diagnosis codes with high prediction performance in large healthcare datasets.

Thomas E Cowling David A Cromwell Linda D Sharples Jan van der Meulen

J Clin Epidemiol

Department of Health Services Research and Policy, London School of Hygiene and Tropical Medicine, Keppel St, London WC1E 7HT, UK; Clinical Effectiveness Unit, Royal College of Surgeons of England, Lincoln's Inn Fields, London WC2A 3PE, UK.

Published: December 2020

Objectives: The objective of the study was to examine an approach for selecting small sets of diagnosis codes with high prediction performance in large datasets of electronic medical records.

Study Design And Setting: This was a modeling study using national hospital and mortality records for patients with myocardial infarction (n = 200,119), hip fracture (n = 169,646), or colorectal cancer surgery (n = 56,515) in England in 2015-2017. One-year mortality was predicted from ICD-10 codes recorded for at least 0.5% of patients using logistic regression ('full' models). An approximation method was used to select fewer codes that explained at least 95% of variation in full model predictions ('reduced' models).

Results: One-year mortality was 17.2% (34,520) after myocardial infarction, 27.2% (46,115) after hip fracture, and 9.3% (5,273) after colorectal surgery. Full models included 202, 257, and 209 ICD-10 codes in these populations. C-statistics for these models were 0.884 (95% confidence interval (CI) 0.882, 0.886), 0.798 (0.795, 0.800), and 0.810 (0.804, 0.817). Reduced models included 18, 33, and 41 codes and had c-statistics of 0.874 (95% CI 0.872, 0.876), 0.791 (0.788, 0.793), and 0.807 (0.801, 0.813). Performance was also similar when measured using Brier scores. All models were well calibrated.

Conclusion: Our approach selected small sets of diagnosis codes that predicted patient outcomes comparably to large, comprehensive sets of codes.

Download full-text PDF	Source
http://dx.doi.org/10.1016/j.jclinepi.2020.08.001	DOI Listing

Publication Analysis

Top Keywords

small sets

sets diagnosis

diagnosis codes

approach selected

selected small

codes

codes high

high prediction

prediction performance

performance large

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!