Background: Interactive artificial intelligence tools such as ChatGPT have gained popularity, yet little is known about their reliability as a reference tool for healthcare-related information for healthcare providers and trainees. The objective of this study was to assess the consistency, quality, and accuracy of the responses generated by ChatGPT on healthcare-related inquiries.
Methods: A total of 18 open-ended questions including six questions in three defined clinical areas (2 each to address "what", "why", and "how", respectively) were submitted to ChatGPT v3.5 based on real-world usage experience. The experiment was conducted in duplicate using 2 computers. Five investigators independently ranked each response using a 4-point scale to rate the quality of the bot's responses. The Delphi method was used to compare each investigator's score with the goal of reaching at least 80% consistency. The accuracy of the responses was checked using established professional references and resources. When the responses were in question, the bot was asked to provide reference material used for the investigators to determine the accuracy and quality. The investigators determined the consistency, accuracy, and quality by establishing a consensus.
Results: The speech pattern and length of the responses were consistent within the same user but different between users. Occasionally, ChatGPT provided 2 completely different responses to the same question. Overall, ChatGPT provided more accurate responses (8 out of 12) to the "what" questions with less reliable performance to the "why" and "how" questions. We identified errors in calculation, unit of measurement, and misuse of protocols by ChatGPT. Some of these errors could result in clinical decisions leading to harm. We also identified citations and references shown by ChatGPT that did not exist in the literature.
Conclusions: ChatGPT is not ready to take on the coaching role for either healthcare learners or healthcare professionals. The lack of consistency in the responses to the same question is problematic for both learners and decision-makers. The intrinsic assumptions made by the chatbot could lead to erroneous clinical decisions. The unreliability in providing valid references is a serious flaw in using ChatGPT to drive clinical decision making.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11668057 | PMC |
http://dx.doi.org/10.1186/s12911-024-02824-5 | DOI Listing |
Sci Rep
January 2025
Department of Orthopedic Surgery, Chang Gung Memorial Hospital, No. 5, Fuxing St., Guishan Dist, Linkou, Taoyuan, 33305, Taiwan.
Objective: To investigate the predictive ability of the MRI-based vertebral bone quality (VBQ) score for pedicle screw loosening following instrumented transforaminal lumbar interbody fusion (TLIF).
Methods: Data from patients who have received one or two-level instrumented TLIF from February 2014 to March 2015 were retrospectively collected. Pedicle screw loosening was diagnosed when the radiolucent zone around the screw exceeded 1 mm in plain radiographs.
J Craniomaxillofac Surg
January 2025
Department of Diagnostic and Interventional Radiology, University Medical Center Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany.
The potential of large language models (LLMs) in medical applications is significant, and Retrieval-augmented generation (RAG) can address the weaknesses of these models in terms of data transparency and scientific accuracy by incorporating current scientific knowledge into responses. In this study, RAG and GPT-4 by OpenAI were applied to develop GuideGPT, a context aware chatbot integrated with a knowledge database from 449 scientific publications designed to provide answers on the prevention, diagnosis, and treatment of medication-related osteonecrosis of the jaw (MRONJ). A comparison was made with a generic LLM ("PureGPT") across 30 MRONJ-related questions.
View Article and Find Full Text PDFInt Dent J
January 2025
Department of Basic Sciences, Faculty of Dental Sciences, University of Peradeniya, Peradeniya, 20400 Sri Lanka. Electronic address:
Objective: This study evaluated the effectiveness of an AI-based tool (ChatGPT-4) (AIT) vs a human tutor (HT) in providing feedback on dental students' assignments.
Methods: A total of 194 answers to two histology questions were assessed by both tutors using the same rubric. Students compared feedback from both tutors and evaluated its accuracy against a standard rubric.
J Mol Biol
January 2025
School of Mathematics and Physics, University of Science and Technology Beijing, Beijing, 100083, China. Electronic address:
Single-cell RNA sequencing (scRNA-seq) analysis offers tremendous potential for addressing various biological questions, with one key application being the annotation of query datasets with unknown cell types using well-annotated external reference datasets. However, the performance of existing supervised or semi-supervised methods largely depends on the quality of source data. Furthermore, these methods often struggle with the batch effects arising from different platforms when handling multiple reference or query datasets, making precise annotation challenging.
View Article and Find Full Text PDFJ Adv Res
January 2025
Department of Mechanics and Strength of Materials, Politehnica University Timisoara, 1 Mihai Viteazu Avenue, 300 222 Timisoara, Romania. Electronic address:
Background: Today, in a wide variety of industries, grinding operations are an extremely important finishing process for obtaining precise dimensions and meeting strict requirements for roughness and shape accuracy. However, the constant wear of abrasive tools during grinding negatively affects the dimensional and surface conditions of the workpiece. Therefore, effective monitoring of the wear process during grinding operations helps to predict tool life, plan maintenance and ensure consistent product quality.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!