AI Article Synopsis

  • - ChatGPT-4 was tested on 271 nephrology-related questions sourced from X polls to see how well its answers matched professional opinions in the field, with results analyzed based on two rounds of responses.
  • - In the first round, ChatGPT agreed with poll results 60.2% of the time, which slightly improved to 63.1% in the second round, indicating modest effectiveness in aligning with expert opinions.
  • - Subgroup analysis showed better performance in specific topics like homeostasis and pharmacology, and the study highlights both the potential and limitations of using AI like ChatGPT in medical decision-making.

Article Abstract

Background: Professional opinion polling has become a popular means of seeking advice for complex nephrology questions in the #AskRenal community on X. ChatGPT is a large language model with remarkable problem-solving capabilities, but its ability to provide solutions for real-world clinical scenarios remains unproven. This study seeks to evaluate how closely ChatGPT's responses align with current prevailing medical opinions in nephrology.

Methods: Nephrology polls from X were submitted to ChatGPT-4, which generated answers without prior knowledge of the poll outcomes. Its responses were compared to the poll results (inter-rater) and a second set of responses given after a one-week interval (intra-rater) using Cohen's kappa statistic (κ). Subgroup analysis was performed based on question subject matter.

Results: Our analysis comprised two rounds of testing ChatGPT on 271 nephrology-related questions. In the first round, ChatGPT's responses agreed with poll results for 163 of the 271 questions (60.2%; κ = 0.42, 95% CI: 0.38-0.46). In the second round, conducted to assess reproducibility, agreement improved slightly to 171 out of 271 questions (63.1%; κ = 0.46, 95% CI: 0.42-0.50). Comparison of ChatGPT's responses between the two rounds demonstrated high internal consistency, with agreement in 245 out of 271 responses (90.4%; κ = 0.86, 95% CI: 0.82-0.90). Subgroup analysis revealed stronger performance in the combined areas of homeostasis, nephrolithiasis, and pharmacology (κ = 0.53, 95% CI: 0.47-0.59 in both rounds), compared to other nephrology subfields.

Conclusion: ChatGPT-4 demonstrates modest capability in replicating prevailing professional opinion in nephrology polls overall, with varying performance levels between question topics and excellent internal consistency. This study provides insights into the potential and limitations of using ChatGPT in medical decision making.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11363238PMC
http://dx.doi.org/10.1177/20552076241277458DOI Listing

Publication Analysis

Top Keywords

chatgpt's responses
12
professional opinion
8
nephrology polls
8
subgroup analysis
8
271 questions
8
internal consistency
8
responses
6
nephrology
5
digital health
4
health tools
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!