Evaluating the performance of ChatGPT in patient consultation and image-based preliminary diagnosis in thyroid eye disease.

Yue Wang Shuo Yang Chengcheng Zeng Yingwei Xie Ya Shen Jian Li Xiao Huang Ruili Wei Yuqing Chen

Front Med (Lausanne)

Department of Ophthalmology, Changzheng Hospital of Naval Medical University, Shanghai, China.

Published: February 2025

Background: The emergence of Large Language Model (LLM) chatbots, such as ChatGPT, has great promise for enhancing healthcare practice. Online consultation, accurate pre-diagnosis, and clinical efforts are of fundamental importance for the patient-oriented management system.

Objective: This cross-sectional study aims to evaluate the performance of ChatGPT in inquiries across ophthalmic domains and to focus on Thyroid Eye Disease (TED) consultation and image-based preliminary diagnosis in a non-English language.

Methods: We obtained frequently consulted clinical inquiries from a published reference based on patient consultation data, titled . Additionally, we collected facial and Computed Tomography (CT) images from 16 patients with a definitive diagnosis of TED. From 18 to 30 May 2024, inquiries about the TED consultation and preliminary diagnosis were posed to ChatGPT using a new chat for each question. Responses to questions from ChatGPT-4, 4o, and an experienced ocular professor were compiled into three questionnaires, which were evaluated by patients and ophthalmologists on four dimensions: accuracy, comprehensiveness, conciseness, and satisfaction. The preliminary diagnosis of TED was deemed accurate, and the differences in the accuracy rates were further calculated.

Results: For common TED consultation questions, ChatGPT-4o delivered more accurate information with logical consistency, adhering to a structured format of disease definition, detailed sections, and summarized conclusions. Notably, the answers generated by ChatGPT-4o were rated higher than those of ChatGPT-4 and the professor, with accuracy (4.33 [0.69]), comprehensiveness (4.17 [0.75]), conciseness (4.12 [0.77]), and satisfaction (4.28 [0.70]). The characteristics of the evaluators, the response variables, and other quality scores were all correlated with overall satisfaction levels. Based on several facial images, ChatGPT-4 twice failed to make diagnoses because of lacking characteristic symptoms or a complete medical history, whereas ChatGPT-4o accurately identified the pathologic conditions in 31.25% of cases (95% confidence interval, CI: 11.02-58.66%). Furthermore, in combination with CT images, ChatGPT-4o performed comparably to the professor in terms of diagnosis accuracy (87.5, 95% CI 61.65-98.45%).

Conclusion: ChatGPT-4o excelled in comprehensive and satisfactory patient consultation and imaging interpretation, indicating the potential to improve clinical practice efficiency. However, limitations in disinformation management and legal permissions remain major concerns, which require further investigation in clinical practice.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11876178	PMC
http://dx.doi.org/10.3389/fmed.2025.1546706	DOI Listing

Publication Analysis

Top Keywords

preliminary diagnosis

patient consultation

ted consultation

performance chatgpt

consultation image-based

image-based preliminary

thyroid eye

eye disease

diagnosis ted

clinical practice

Similar Publications

Ability and utility of the Physician Well-Being Index to identify distress among Chinese physicians.

Ann Med

December 2025

Department of Psychiatry, National Clinical Research Center for Mental Disorders, and National Center for Mental Disorders, the Second Xiangya Hospital of Central South University, Changsha, Hunan, China.

Zejun Li Peng Pu Min Wu Xin Wang Huixue Xu

Background: Despite the high prevalence of mental stress among physicians, reliable screening tools are scarce. This study aimed to evaluate the capability of the Physician Well-Being Index (PWBI) in identifying distress and adverse consequences among Chinese physicians.

Methods: This cross-sectional online survey recruited 2803 physicians from Southern Mainland China snowball sampling between October and December 2020.

View Article and Find Full Text PDF

Similar Publications

Personalized medicine in pancreatic cancer: Harnessing the potential of mRNA vaccines.

J Genet Eng Biotechnol

March 2025

Karachi Medical and Dental College, Pakistan. Electronic address:

Aariz Hussain Areeba Fareed

Pancreatic ductal adenocarcinoma (PDAC) is a highly lethal cancer with a five-year survival rate of just 7%. Its late diagnosis and limited treatment options contribute to poor outcomes. Immunotherapy has had little success due to PDAC's dense and immunosuppressive tumor environment.

View Article and Find Full Text PDF

Similar Publications

Integrating Ambient In-Home Sensor Data and Electronic Health Record Data for the Prediction of Outcomes in Amyotrophic Lateral Sclerosis: Protocol for an Exploratory Feasibility Study.

JMIR Res Protoc

March 2025

Institute for Data Science and Informatics, University of Missouri, Columbia, MO, United States.

William E Janes Noah Marchal Xing Song Mihail Popescu Abu Saleh Mohammad Mosa

Background: Amyotrophic lateral sclerosis (ALS) leads to rapid physiological and functional decline before causing untimely death. Current best-practice approaches to interdisciplinary care are unable to provide adequate monitoring of patients' health. Passive in-home sensor systems enable 24×7 health monitoring.

View Article and Find Full Text PDF

Similar Publications

Remote work and long-term sickness absence due to mental disorder trends among Japanese workers pre/post COVID-19.

PLoS One

March 2025

Department of Neuropsychiatry, Osaka Metropolitan University Graduate School of Medicine, Osaka, Japan.

Yasuhiko Deguchi Shinichi Iwasaki Yuki Uesaka Yutaro Okawa Shohei Okura

Aims: The aim of this study was to ascertain whether there has been an increase in the number of workers with long-term sickness absence due to mental disorders (LTSA-MD) and determine the impact of remote work on new LTSA-MD cases.

Methods: A web-based questionnaire was sent to 2,552 company offices with 150 or more workers in Osaka Prefecture. Data were obtained on the number of workers with LTSA-MD between April 1, 2019, and March 31, 2020 (fiscal year 2019) and between April 1, 2020, and March 31, 2021 (fiscal year 2020), along with their MD diagnoses (adjustment disorder [AD], depressive disorder [DEP], etc.

View Article and Find Full Text PDF

Similar Publications

Minimal Residual Disease in Metastatic Soft Tissue Sarcoma.

Curr Treat Options Oncol

March 2025

Division of Medical Oncology, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA.

Ioannis Kournoutas Brittany L Siontis

Liquid biopsies represent a promising and minimally invasive approach to diagnosing and monitoring cancer. In recent years, studies across a multitude of solid organ malignancies have suggested the clinical utility of biomarkers such as circulating tumor DNA (ctDNA). Particular attention has been given to serial assessment of such biomarkers in an effort to detect minimal residual disease (MRD), in order to predict which patients may be at highest risk of relapse following curative-intent surgical or medical intervention.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!