Benchmarking Large Language Models for Extraction of International Classification of Diseases Codes from Clinical Documentation.

Ashley Simmons Kullaya Takkavatakarn Megan McDougal Brian Dilcher Jami Pincavitch Lukas Meadows Justin Kauffman Eyal Klang Rebecca Wig Gordon Smith Ali Soroush Robert Freeman Donald J Apakama Alexander W Charney Roopa Kohli-Seth Girish N Nadkarni Ankit Sakhuja

medRxiv

Published: November 2024

Background: Healthcare reimbursement and coding is dependent on accurate extraction of International Classification of Diseases-tenth revision - clinical modification (ICD-10-CM) codes from clinical documentation. Attempts to automate this task have had limited success. This study aimed to evaluate the performance of large language models (LLMs) in extracting ICD-10-CM codes from unstructured inpatient notes and benchmark them against human coder.

Methods: This study compared performance of GPT-3.5, GPT4, Claude 2.1, Claude 3, Gemini Advanced, and Llama 2-70b in extracting ICD-10-CM codes from unstructured inpatient notes against a human coder. We presented deidentified inpatient notes from American Health Information Management Association Vlab authentic patient cases to LLMs and human coder for extraction of ICD-10-CM codes. We used a standard prompt for extracting ICD-10-CM codes. The human coder analyzed the same notes using 3M Encoder, adhering to the 2022-ICD-10-CM Coding Guidelines.

Results: In this study, we analyzed 50 inpatient notes, comprising of 23 history and physicals and 27 progress notes. The human coder identified 165 unique codes with a median of 4 codes per note. The LLMs extracted varying numbers of median codes per note: GPT 3.5: 7, GPT4: 6, Claude 2.1: 6, Claude 3: 8, Gemini Advanced: 5, and Llama 2-70b:11. GPT 4 had the best performance though the agreement with human coder was poor at 15.2% for overall extraction of ICD-10-CM codes and 26.4% for extraction of category ICD-10-CM codes.

Conclusion: Current LLMs have poor performance in extraction of ICD-10-CM codes from inpatient notes when compared against a human coder.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11601733	PMC
http://dx.doi.org/10.1101/2024.04.29.24306573	DOI Listing

Publication Analysis

Top Keywords

icd-10-cm codes

human coder

inpatient notes

extracting icd-10-cm

extraction icd-10-cm

codes

large language

language models

extraction international

international classification

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!