Generative artificial intelligence chatbots may provide appropriate informational responses to common vascular surgery questions by patients.

Ethan Chervonski Keerthi B Harish Caron B Rockman Mikel Sadek Katherine A Teter Glenn R Jacobowitz Todd L Berland Joann Lohr Colleen Moore Thomas S Maldonado

Vascular

Division of Vascular & Endovascular Surgery, Department of Surgery, New York University Langone Health, New York, NY, USA.

Published: March 2024

Objectives: Generative artificial intelligence (AI) has emerged as a promising tool to engage with patients. The objective of this study was to assess the quality of AI responses to common patient questions regarding vascular surgery disease processes.

Methods: OpenAI's ChatGPT-3.5 and Google Bard were queried with 24 mock patient questions spanning seven vascular surgery disease domains. Six experienced vascular surgery faculty at a tertiary academic center independently graded AI responses on their accuracy (rated 1-4 from completely inaccurate to completely accurate), completeness (rated 1-4 from totally incomplete to totally complete), and appropriateness (binary). Responses were also evaluated with three readability scales.

Results: ChatGPT responses were rated, on average, more accurate than Bard responses (3.08 ± 0.33 vs 2.82 ± 0.40, < .01). ChatGPT responses were scored, on average, more complete than Bard responses (2.98 ± 0.34 vs 2.62 ± 0.36, < .01). Most ChatGPT responses (75.0%, = 18) and almost half of Bard responses (45.8%, = 11) were unanimously deemed appropriate. Almost one-third of Bard responses (29.2%, = 7) were deemed inappropriate by at least two reviewers (29.2%), and two Bard responses (8.4%) were considered inappropriate by the majority. The mean Flesch Reading Ease, Flesch-Kincaid Grade Level, and Gunning Fog Index of ChatGPT responses were 29.4 ± 10.8, 14.5 ± 2.2, and 17.7 ± 3.1, respectively, indicating that responses were readable with a post-secondary education. Bard's mean readability scores were 58.9 ± 10.5, 8.2 ± 1.7, and 11.0 ± 2.0, respectively, indicating that responses were readable with a high-school education ( < .0001 for three metrics). ChatGPT's mean response length (332 ± 79 words) was higher than Bard's mean response length (183 ± 53 words, < .001). There was no difference in the accuracy, completeness, readability, or response length of ChatGPT or Bard between disease domains ( > .05 for all analyses).

Conclusions: AI offers a novel means of educating patients that avoids the inundation of information from "Dr Google" and the time barriers of physician-patient encounters. ChatGPT provides largely valid, though imperfect, responses to myriad patient questions at the expense of readability. While Bard responses are more readable and concise, their quality is poorer. Further research is warranted to better understand failure points for large language models in vascular surgery patient education.

Download full-text PDF	Source
http://dx.doi.org/10.1177/17085381241240550	DOI Listing

Publication Analysis

Top Keywords

bard responses

vascular surgery

responses

chatgpt responses

patient questions

responses readable

response length

generative artificial

artificial intelligence

responses common

Similar Publications

The Comparative Sufficiency of ChatGPT, Google Bard, and Bing AI in Answering Diagnosis, Treatment, and Prognosis Questions About Common Dermatological Diagnoses.

JMIR Dermatol

January 2025

Skin Refinery PLLC, Spokane, WA, United States.

Courtney A Chau Hao Feng Gabriela Cobos Joyce Park

Our team explored the utility of unpaid versions of 3 artificial intelligence chatbots in offering patient-facing responses to questions about 5 common dermatological diagnoses, and highlighted the strengths and limitations of different artificial intelligence chatbots, while demonstrating how chatbots presented the most potential in tandem with dermatologists' diagnosis.

View Article and Find Full Text PDF

Similar Publications

Can Large Language Models Aid Caregivers of Pediatric Cancer Patients in Information Seeking? A Cross-Sectional Investigation.

Cancer Med

January 2025

The Abigail Wexner Research Institute at Nationwide Children's Hospital, Columbus, Ohio, USA.

Emre Sezgin Daniel I Jackson A Baki Kocaballi Mindy Bibart Sue Zupanec

Purpose: Caregivers in pediatric oncology need accurate and understandable information about their child's condition, treatment, and side effects. This study assesses the performance of publicly accessible large language model (LLM)-supported tools in providing valuable and reliable information to caregivers of children with cancer.

Methods: In this cross-sectional study, we evaluated the performance of the four LLM-supported tools-ChatGPT (GPT-4), Google Bard (Gemini Pro), Microsoft Bing Chat, and Google SGE-against a set of frequently asked questions (FAQs) derived from the Children's Oncology Group Family Handbook and expert input (In total, 26 FAQs and 104 generated responses).

View Article and Find Full Text PDF

Similar Publications

Opportunities and Challenges of Chatbots in Ophthalmology: A Narrative Review.

J Pers Med

December 2024

Department of Clinical Research, University of Southern Denmark, 5230 Odense, Denmark.

Mehmet Cem Sabaner Rodrigo Anguita Fares Antaki Michael Balas Lars Christian Boberg-Ans

Artificial intelligence (AI) is becoming increasingly influential in ophthalmology, particularly through advancements in machine learning, deep learning, robotics, neural networks, and natural language processing (NLP). Among these, NLP-based chatbots are the most readily accessible and are driven by AI-based large language models (LLMs). These chatbots have facilitated new research avenues and have gained traction in both clinical and surgical applications in ophthalmology.

View Article and Find Full Text PDF

Similar Publications

Desktop 3D printers in the workplace: use, emissions, controls, and health.

Ann Work Expo Health

December 2024

HSE Science Division, Health and Safety Executive's Science and Research Centre, Harpur Hill, Buxton SK17 9JN, United Kingdom.

Samantha Hall Jade Sumner Graeme Hunwin Samuel Martell Ian Pengelly

Desktop three-dimensional (3D) printers are used in businesses, schools, and colleges, and are generally of an unenclosed design which may give rise to injuries or inhalation exposure to emissions of small particles (<1 µm) and volatile organic compounds (VOCs). The aim of this work was to explore the health risks related to the use of desktop 3D printers in workplaces in the United Kingdom. A digital survey on the use of desktop 3D printers was completed voluntarily and anonymously between February and June 2023, receiving 146 responses.

View Article and Find Full Text PDF

Similar Publications

Evaluating Bard Gemini Pro and GPT-4 Vision Against Student Performance in Medical Visual Question Answering: Comparative Case Study.

JMIR Form Res

December 2024

Department of Dermatology and Allergy, Technical University of Munich, Munich, Germany.

Jonas Roos Ron Martin Robert Kaczmarczyk

Background: The rapid development of large language models (LLMs) such as OpenAI's ChatGPT has significantly impacted medical research and education. These models have shown potential in fields ranging from radiological imaging interpretation to medical licensing examination assistance. Recently, LLMs have been enhanced with image recognition capabilities.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!