Background: The emergence of Large Language Model (LLM) chatbots, such as ChatGPT, has great promise for enhancing healthcare practice. Online consultation, accurate pre-diagnosis, and clinical efforts are of fundamental importance for the patient-oriented management system.

Objective: This cross-sectional study aims to evaluate the performance of ChatGPT in inquiries across ophthalmic domains and to focus on Thyroid Eye Disease (TED) consultation and image-based preliminary diagnosis in a non-English language.

Methods: We obtained frequently consulted clinical inquiries from a published reference based on patient consultation data, titled . Additionally, we collected facial and Computed Tomography (CT) images from 16 patients with a definitive diagnosis of TED. From 18 to 30 May 2024, inquiries about the TED consultation and preliminary diagnosis were posed to ChatGPT using a new chat for each question. Responses to questions from ChatGPT-4, 4o, and an experienced ocular professor were compiled into three questionnaires, which were evaluated by patients and ophthalmologists on four dimensions: accuracy, comprehensiveness, conciseness, and satisfaction. The preliminary diagnosis of TED was deemed accurate, and the differences in the accuracy rates were further calculated.

Results: For common TED consultation questions, ChatGPT-4o delivered more accurate information with logical consistency, adhering to a structured format of disease definition, detailed sections, and summarized conclusions. Notably, the answers generated by ChatGPT-4o were rated higher than those of ChatGPT-4 and the professor, with accuracy (4.33 [0.69]), comprehensiveness (4.17 [0.75]), conciseness (4.12 [0.77]), and satisfaction (4.28 [0.70]). The characteristics of the evaluators, the response variables, and other quality scores were all correlated with overall satisfaction levels. Based on several facial images, ChatGPT-4 twice failed to make diagnoses because of lacking characteristic symptoms or a complete medical history, whereas ChatGPT-4o accurately identified the pathologic conditions in 31.25% of cases (95% confidence interval, CI: 11.02-58.66%). Furthermore, in combination with CT images, ChatGPT-4o performed comparably to the professor in terms of diagnosis accuracy (87.5, 95% CI 61.65-98.45%).

Conclusion: ChatGPT-4o excelled in comprehensive and satisfactory patient consultation and imaging interpretation, indicating the potential to improve clinical practice efficiency. However, limitations in disinformation management and legal permissions remain major concerns, which require further investigation in clinical practice.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11876178PMC
http://dx.doi.org/10.3389/fmed.2025.1546706DOI Listing

Publication Analysis

Top Keywords

preliminary diagnosis
16
patient consultation
12
ted consultation
12
performance chatgpt
8
consultation image-based
8
image-based preliminary
8
thyroid eye
8
eye disease
8
diagnosis ted
8
clinical practice
8

Similar Publications

Ability and utility of the Physician Well-Being Index to identify distress among Chinese physicians.

Ann Med

December 2025

Department of Psychiatry, National Clinical Research Center for Mental Disorders, and National Center for Mental Disorders, the Second Xiangya Hospital of Central South University, Changsha, Hunan, China.

Background: Despite the high prevalence of mental stress among physicians, reliable screening tools are scarce. This study aimed to evaluate the capability of the Physician Well-Being Index (PWBI) in identifying distress and adverse consequences among Chinese physicians.

Methods: This cross-sectional online survey recruited 2803 physicians from Southern Mainland China snowball sampling between October and December 2020.

View Article and Find Full Text PDF

Pancreatic ductal adenocarcinoma (PDAC) is a highly lethal cancer with a five-year survival rate of just 7%. Its late diagnosis and limited treatment options contribute to poor outcomes. Immunotherapy has had little success due to PDAC's dense and immunosuppressive tumor environment.

View Article and Find Full Text PDF

Background: Amyotrophic lateral sclerosis (ALS) leads to rapid physiological and functional decline before causing untimely death. Current best-practice approaches to interdisciplinary care are unable to provide adequate monitoring of patients' health. Passive in-home sensor systems enable 24×7 health monitoring.

View Article and Find Full Text PDF

Aims: The aim of this study was to ascertain whether there has been an increase in the number of workers with long-term sickness absence due to mental disorders (LTSA-MD) and determine the impact of remote work on new LTSA-MD cases.

Methods: A web-based questionnaire was sent to 2,552 company offices with 150 or more workers in Osaka Prefecture. Data were obtained on the number of workers with LTSA-MD between April 1, 2019, and March 31, 2020 (fiscal year 2019) and between April 1, 2020, and March 31, 2021 (fiscal year 2020), along with their MD diagnoses (adjustment disorder [AD], depressive disorder [DEP], etc.

View Article and Find Full Text PDF

Minimal Residual Disease in Metastatic Soft Tissue Sarcoma.

Curr Treat Options Oncol

March 2025

Division of Medical Oncology, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA.

Liquid biopsies represent a promising and minimally invasive approach to diagnosing and monitoring cancer. In recent years, studies across a multitude of solid organ malignancies have suggested the clinical utility of biomarkers such as circulating tumor DNA (ctDNA). Particular attention has been given to serial assessment of such biomarkers in an effort to detect minimal residual disease (MRD), in order to predict which patients may be at highest risk of relapse following curative-intent surgical or medical intervention.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!