A PHP Error was encountered

Severity: Warning

Message: file_get_contents(https://...@pubfacts.com&api_key=b8daa3ad693db53b1410957c26c9a51b4908&a=1): Failed to open stream: HTTP request failed! HTTP/1.1 429 Too Many Requests

Filename: helpers/my_audit_helper.php

Line Number: 176

Backtrace:

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 176
Function: file_get_contents

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 250
Function: simplexml_load_file_from_url

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 1034
Function: getPubMedXML

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 3152
Function: GetPubMedArticleOutput_2016

File: /var/www/html/application/controllers/Detail.php
Line: 575
Function: pubMedSearch_Global

File: /var/www/html/application/controllers/Detail.php
Line: 489
Function: pubMedGetRelatedKeyword

File: /var/www/html/index.php
Line: 316
Function: require_once

The Large Language Model GPT-4 Compared to Endocrinologist Responses on Initial Choice of Antidiabetic Medication under Conditions of Clinical Uncertainty. | LitMetric

Objective: To explore how the commercially available large language model (LLM) GPT-4 compares to endocrinologists when addressing medical questions when there is uncertainty regarding the best answer.

Research Design And Methods: This study compared responses from GPT-4 to responses from 31 endocrinologists using hypothetical clinical vignettes focused on diabetes, specifically examining the prescription of metformin versus alternative treatments. The primary outcome was the choice between metformin and other treatments.

Results: With a simple prompt, GPT-4 chose metformin in 12% (95% CI 7.9-17%) of responses, compared with 31% (95% CI 23-39%) of endocrinologist responses. After modifying the prompt to encourage metformin use, the selection of metformin by GPT-4 increased to 25% (95% CI 22-28%). GPT-4 rarely selected metformin in patients with impaired kidney function, or a history of gastrointestinal distress (2.9% of responses, 95% CI 1.4-5.5%). In contrast, endocrinologists often prescribed metformin even in patients with a history of gastrointestinal distress (21% of responses, 95% CI 12-36%). GPT-4 responses showed low variability on repeated runs except at intermediate levels of kidney function.

Conclusions: In clinical scenarios with no single right answer, GPT-4's responses were reasonable, but differed from endocrinologists' responses in clinically important ways. Value judgments are needed to determine when these differences should be addressed by adjusting the model. We recommend against reliance on LLM output until it is shown to align not just with clinical guidelines but also with patient and clinician preferences, or it demonstrates improvement in clinical outcomes over standard of care.

Download full-text PDF

Source
http://dx.doi.org/10.2337/dc24-1067DOI Listing

Publication Analysis

Top Keywords

responses
10
large language
8
language model
8
endocrinologist responses
8
gpt-4 responses
8
metformin patients
8
history gastrointestinal
8
gastrointestinal distress
8
responses 95%
8
gpt-4
7

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!