Purpose: Large language model (LLM) artificial intelligences may help physicians appeal insurer denials of prescribed medical services, a task that delays patient care and contributes to burnout. We evaluated LLM performance at this task for denials of radiotherapy services.

Methods: We evaluated generative pretrained transformer 3.5 (GPT-3.5; OpenAI, San Francisco, CA), GPT-4, GPT-4 with internet search functionality (GPT-4web), and GPT-3.5ft. The latter was developed by fine-tuning GPT-3.5 via an OpenAI application programming interface with 53 examples of appeal letters written by radiation oncologists. Twenty test prompts with simulated patient histories were programmatically presented to the LLMs, and output appeal letters were scored by three blinded radiation oncologists for language representation, clinical detail inclusion, clinical reasoning validity, literature citations, and overall readiness for insurer submission.

Results: Interobserver agreement between radiation oncologists' scores was moderate or better for all domains (Cohen's kappa coefficients: 0.41-0.91). GPT-3.5, GPT-4, and GPT-4web wrote letters that were on average linguistically clear, summarized provided clinical histories without confabulation, reasoned appropriately, and were scored useful to expedite the insurance appeal process. GPT-4 and GPT-4web letters demonstrated superior clinical reasoning and were readier for submission than GPT-3.5 letters ( < .001). Fine-tuning increased GPT-3.5ft confabulation and compromised performance compared with other LLMs across all domains ( < .001). All LLMs, including GPT-4web, were poor at supporting clinical assertions with existing, relevant, and appropriately cited primary literature.

Conclusion: When prompted appropriately, three commercially available LLMs drafted letters that physicians deemed would expedite appealing insurer denials of radiotherapy services. LLMs may decrease this task's clerical workload on providers. However, LLM performance worsened when fine-tuned with a task-specific, small training data set.

Download full-text PDF

Source
http://dx.doi.org/10.1200/CCI.24.00129DOI Listing

Publication Analysis

Top Keywords

large language
8
radiotherapy services
8
insurer denials
8
llm performance
8
denials radiotherapy
8
gpt-35 openai
8
appeal letters
8
radiation oncologists
8
clinical reasoning
8
gpt-4 gpt-4web
8

Similar Publications

The HIV epidemic in Indonesia is one of the fastest growing in Southeast Asia and is characterised by a number of geographic and sociocultural challenges. Can large language models (LLMs) be integrated with telehealth (TH) to address cost and quality of care? A literature review was performed using the PRISMA-ScR (2018) guidelines between Jan 2017 and June 2024 using the PubMed, ArXiv and semantic scholar databases. Of the 694 records identified, 12 studies met the inclusion criteria.

View Article and Find Full Text PDF

Objective: To assess whether social determinants of health (SDOHs) are associated with the first antiseizure medication (ASM) prescribed for newly diagnosed epilepsy.

Methods: The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) standards were followed, and the protocol registered (CRD42023448998). Embase, Medline, and Web of Science were searched up to July 31, 2023.

View Article and Find Full Text PDF

Background: The Montreal classification has been widely used in Crohn's disease since 2005 to categorize patients by the age of onset (A), disease location (L), behavior (B), and upper gastrointestinal tract and perianal involvement. With evolving management paradigms in Crohn's disease, we aimed to assess the performance of gastroenterologists in applying the Montreal classification.

Methods: An online survey was conducted among participants at an international educational conference on inflammatory bowel diseases.

View Article and Find Full Text PDF

Biological invasions are a major threat to biodiversity, ecosystem functioning and nature's contributions to people worldwide. However, the effectiveness of invasive alien species (IAS) management measures and the progress toward achieving biodiversity targets remain uncertain due to limited and nonuniform data availability. Management success is usually assessed at a local level and documented in technical reports, often written in languages other than English, which makes such data notoriously difficult to collect at large geographic scales.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!