Online artificial intelligence platforms and their applicability to gastrointestinal surgical operations.

Muhammad Musaab Munir Yutaka Endo Aslam Ejaz Mary Dillhoff Jordan M Cloyd Timothy M Pawlik

J Gastrointest Surg

Division of Surgical Oncology, Department of Surgery, The Ohio State University Wexner Medical Center and James Comprehensive Cancer Center, Columbus, Ohio, United States. Electronic address:

Published: January 2024

The internet is a popular place for people to find health information, and this study looked at how well an AI tool, like ChatGPT, can answer questions about common surgeries for the stomach and intestines.
Researchers created a quiz with 24 questions about three types of surgeries and asked ChatGPT to answer them, then experts rated the quality of those answers.
Most of the AI responses were rated as "fair" or "good," but responses about one surgery, cholecystectomy, were judged to be better than the others, while answers for pancreatic surgery were not as good.

Background: The internet is a common source of health information for patients. Interactive online artificial intelligence (AI) may be a more reliable source of health-related information than traditional search engines. This study aimed to assess the quality and perceived utility of chat-based AI responses related to 3 common gastrointestinal (GI) surgical procedures.

Methods: A survey of 24 questions covering general perioperative information on cholecystectomy, pancreaticoduodenectomy (PD), and colectomy was created. Each question was posed to Chat Generative Pre-trained Transformer (ChatGPT) in June 2023, and the generated responses were recorded. The quality and perceived utility of responses were independently and subjectively graded by expert respondents specific to each surgical field. Grades were classified as "poor," "fair," "good," "very good," or "excellent."

Results: Among the 45 respondents (general surgeon [n = 13], surgical oncologist [n = 18], colorectal surgeon [n = 13], and transplant surgeon [n = 1]), most practiced at an academic facility (95.6%). Respondents had been in practice for a mean of 12.3 years (general surgeon, 14.5 ± 7.2; surgical oncologist, 12.1 ± 8.2; colorectal surgeon, 10.2 ± 8.0) and performed a mean 53 index operations annually (cholecystectomy, 47 ± 28; PD, 28 ± 27; colectomy, 81 ± 44). Overall, the most commonly assigned quality grade was "fair" or "good" for most responses (n = 622/1080, 57.6%). Most of the 1080 total utility grades were "fair" (n = 279, 25.8%) or "good" (n = 344, 31.9%), whereas only 129 utility grades (11.9%) were "poor." Of note, ChatGPT responses related to cholecystectomy (45.3% ["very good"/"excellent"] vs 18.1% ["poor"/"fair"]) were deemed to be better quality than AI responses about PD (18.9% ["very good"/"excellent"] vs 46.9% ["poor"/"fair"]) or colectomy (31.4% ["very good"/"excellent"] vs 38.3% ["poor"/"fair"]). Overall, only 20.0% of the experts deemed ChatGPT to be an accurate source of information, whereas 15.6% of the experts found it unreliable. Moreover, 1 in 3 surgeons deemed ChatGPT responses as not likely to reduce patient-physician correspondence (31.1%) or not comparable to in-person surgeon responses (35.6%).

Conclusions: Although a potential resource for patient education, ChatGPT responses to common GI perioperative questions were deemed to be of only modest quality and utility to patients. In addition, the relative quality of AI responses varied markedly on the basis of procedure type.

Download full-text PDF	Source
http://dx.doi.org/10.1016/j.gassur.2023.11.019	DOI Listing

Publication Analysis

Top Keywords

chatgpt responses

["very good"/"excellent"]

responses

online artificial

artificial intelligence

gastrointestinal surgical

quality perceived

perceived utility

responses common

"fair" "good"

Similar Publications

Assessing the performance of AI chatbots in answering patients' common questions about low back pain.

Ann Rheum Dis

January 2025

Masters and Doctoral Programs in Physical Therapy, Universidade Cidade de Sao Paulo, Sao Paulo, Brazil; Discipline of Physiotherapy, Graduate School of Health, Faculty of Health, University of Technology, Sydney, New South Wales, Australia.

Simone P S Scaff Felipe J J Reis Giovanni E Ferreira Maria Fernanda Jacob Bruno T Saragiotto

Objectives: The aim of this study was to assess the accuracy and readability of the answers generated by large language model (LLM)-chatbots to common patient questions about low back pain (LBP).

Methods: This cross-sectional study analysed responses to 30 LBP-related questions, covering self-management, risk factors and treatment. The questions were developed by experienced clinicians and researchers and were piloted with a group of consumer representatives with lived experience of LBP.

View Article and Find Full Text PDF

Similar Publications

Emerging trends in managed care pharmacy: A mixed-method study.

J Manag Care Spec Pharm

January 2025

Academy of Managed Care Pharmacy Foundation, Alexandria, VA.

T Joseph Mattingly Laura E Happe Laura Cranston

Background: Over the past 5 years, managed care pharmacy has been shaped by a global pandemic, advancements in generative artificial intelligence (AI), Medicare drug price negotiation policies, and significant therapeutic developments. Collective intelligence methods can be used to anticipate future developments in practice to help organizations plan and develop new strategies around those changes.

Objective: To identify emerging trends in managed care pharmacy.

View Article and Find Full Text PDF

Similar Publications

Quality of Chatbot Responses to the Most Popular Questions Regarding Erectile Dysfunction.

Urol Res Pract

January 2025

Clinic of Urology, Ankara Acibadem Hospital, Ankara, Türkiye.

İrfan Şafak Barlas Lütfi Tunç

Objective: Erectile dysfunction (ED) is a common cause of male sexual dysfunction. We aimed to evaluate the quality of ChatGPT and Gemini's responses to the most frequently asked questions about ED.

Methods: This study was conducted as a cross-sectional, observational study.

View Article and Find Full Text PDF

Similar Publications

Evaluating the image recognition capabilities of GPT-4V and Gemini Pro in the Japanese national dental examination.

J Dent Sci

January 2025

Division of Physiology, Department of Health Promotion, Kyushu Dental University, Kitakyushu, Japan.

Hikaru Fukuda Masaki Morishita Kosuke Muraoka Shino Yamaguchi Taiji Nakamura

Background/purpose: OpenAI's GPT-4V and Google's Gemini Pro, being Large Language Models (LLMs) equipped with image recognition capabilities, have the potential to be utilized in future medical diagnosis and treatment, ands serve as valuable educational support tools for students. This study compared and evaluated the image recognition capabilities of GPT-4V and Gemini Pro using questions from the Japanese National Dental Examination (JNDE) to investigate their potential as educational support tools.

Materials And Methods: We analyzed 160 questions from the 116th JNDE, administered in March 2023, using ChatGPT-4V, and Gemini Pro, which have image recognition functions.

View Article and Find Full Text PDF

Similar Publications

Evaluating the Performance of ChatGPT-4o Oncology Expert in Comparison to Standard Medical Oncology Knowledge: A Focus on Treatment-Related Clinical Questions.

Cureus

January 2025

Medical Oncology, Kartal Dr. Lütfi Kirdar City Hospital, Health Science University, Istanbul, TUR.

Oguzcan Kinikoglu Deniz Isik

Integrating artificial intelligence (AI) into oncology can revolutionize decision-making by providing accurate information. This study evaluates the performance of ChatGPT-4o (OpenAI, San Francisco, CA) Oncology Expert, in addressing open-ended clinical oncology questions. Thirty-seven treatment-related questions on solid organ tumors were selected from a hematology-oncology textbook.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!