Background: Amidst the increasing use of AI in medical research, this study specifically aims to assess and compare the accuracy and credibility of openAI's GPT-4 and Google's Gemini in their ability to generate medical research introductions, focusing on the precision and reliability of their citations across five medical fields.
Methods: We compared the two models, OpenAI's GPT-4 and Google's Gemini Ultra, across five medical fields, focusing on the credibility and accuracy of citations, alongside the analysis of introduction length and unreferenced data.
Results: Gemini outperformed GPT-4 in reference precision.
Background: Empathy, a fundamental aspect of human interaction, is characterized as the ability to experience another being's emotions within oneself. In health care, empathy is a fundamental for health care professionals and patients' interaction. It is a unique quality to humans that large language models (LLMs) are believed to lack.
View Article and Find Full Text PDFAim: Diagnostic imaging is an integral part of identifying spondyloarthropathies (SpA), yet the interpretation of these images can be challenging. This review evaluated the use of deep learning models to enhance the diagnostic accuracy of SpA imaging.
Methods: Following PRISMA guidelines, we systematically searched major databases up to February 2024, focusing on studies that applied deep learning to SpA imaging.
Background/aim: Contrast-enhanced mammography (CEM) is a relatively novel imaging technique that enables both anatomical and functional breast imaging, with improved diagnostic performance compared to standard 2D mammography. The aim of this study is to systematically review the literature on deep learning (DL) applications for CEM, exploring how these models can further enhance CEM diagnostic potential.
Methods: This systematic review was reported according to the PRISMA guidelines.
Aim: To evaluate the accuracy of the Emergency Severity Index (ESI) assignments by GPT-4, a large language model (LLM), compared to senior emergency department (ED) nurses and physicians.
Method: An observational study of 100 consecutive adult ED patients was conducted. ESI scores assigned by GPT-4, triage nurses, and by a senior clinician.
Background: Large language models (LLMs) have shown promise in various professional fields, including medicine and law. However, their performance in highly specialized tasks, such as extracting ICD-10-CM codes from patient notes, remains underexplored.
Objective: The primary objective was to evaluate and compare the performance of ICD-10-CM code extraction by different LLMs with that of human coder.
Importance: Medical ethics is inherently complex, shaped by a broad spectrum of opinions, experiences, and cultural perspectives. The integration of large language models (LLMs) in healthcare is new and requires an understanding of their consistent adherence to ethical standards.
Objective: To compare the agreement rates in answering questions based on ethically ambiguous situations between three frontier LLMs (GPT-4, Gemini-pro-1.
Background: Healthcare reimbursement and coding is dependent on accurate extraction of International Classification of Diseases-tenth revision - clinical modification (ICD-10-CM) codes from clinical documentation. Attempts to automate this task have had limited success. This study aimed to evaluate the performance of large language models (LLMs) in extracting ICD-10-CM codes from unstructured inpatient notes and benchmark them against human coder.
View Article and Find Full Text PDFPurpose: To evaluate AI-based chat bots ability to accurately answer common patient's questions in the field of ophthalmology.
Methods: An experienced ophthalmologist curated a set of 20 representative questions and responses were sought from two AI generative models: OpenAI's ChatGPT and Google's Bard (Gemini Pro). Eight expert ophthalmologists from different sub-specialties assessed each response, blinded to the source, and ranked them by three metrics-accuracy, comprehensiveness, and clarity, on a 1-5 scale.
Large language models (LLMs) can optimize clinical workflows; however, the economic and computational challenges of their utilization at the health system scale are underexplored. We evaluated how concatenating queries with multiple clinical notes and tasks simultaneously affects model performance under increasing computational loads. We assessed ten LLMs of different capacities and sizes utilizing real-world patient data.
View Article and Find Full Text PDFPurpose: While mammography is considered the gold standard for screening women for breast cancer, its accuracy declines in women with dense breasts. The purpose of the study is to evaluate the diagnostic accuracy of contrast enhanced mammography (CEM) for detecting breast cancer in intermediate and high-risk women, including those with genetic predispositions, over a decade-long cohort at a tertiary center.
Methods: We retrospectively analyzed all CEM examinations performed for screening purposes at a tertiary center between 2012 and 2023.
Background: While clinical practice guidelines advocate for multidisciplinary heart team (MDHT) discussions in coronary revascularization, variability in implementation across health care settings remains a challenge. This variability could potentially be addressed by language learning models like ChatGPT, offering decision-making support in diverse health care environments. Our study aims to critically evaluate the concordance between recommendations made by MDHT and those generated by language learning models in coronary revascularization decision-making.
View Article and Find Full Text PDFBackground: Accurate medical coding is essential for clinical and administrative purposes but complicated, time-consuming, and biased. This study compares Retrieval-Augmented Generation (RAG)-enhanced LLMs to provider-assigned codes in producing ICD-10-CM codes from emergency department (ED) clinical records.
Methods: Retrospective cohort study using 500 ED visits randomly selected from the Mount Sinai Health System between January and April 2024.
Multimodal technology is poised to revolutionize clinical practice by integrating artificial intelligence with traditional diagnostic modalities. This evolution traces its roots from Hippocrates' humoral theory to the use of sophisticated AI-driven platforms that synthesize data across multiple sensory channels. The interplay between historical medical practices and modern technology challenges conventional patient-clinician interactions and redefines diagnostic accuracy.
View Article and Find Full Text PDFQuant Imaging Med Surg
October 2024
Background: Differential diagnosis in radiology relies on the accurate identification of imaging patterns. The use of large language models (LLMs) in radiology holds promise, with many potential applications that may enhance the efficiency of radiologists' workflow. The study aimed to evaluate the efficacy of generative pre-trained transformer (GPT)-4, a LLM, in providing differential diagnoses in neuroradiology, comparing its performance with board-certified neuroradiologists.
View Article and Find Full Text PDFIsr J Health Policy Res
September 2024
Background: Sheba Medical Center (SMC) is the largest hospital in Israel and has been coping with a steady increase in total Emergency Department (ED) visits. Over 140,000 patients arrive at the SMC's ED every year. Of those, 19% are admitted to the medical wards.
View Article and Find Full Text PDFLarge language models (LLMs) have significantly impacted various fields with their ability to understand and generate human-like text. This study explores the potential benefits and limitations of integrating LLMs, such as ChatGPT, into haematology practices. Utilizing systematic review methodologies, we analysed studies published after 1 December 2022, from databases like PubMed, Web of Science and Scopus, and assessing each for bias with the QUADAS-2 tool.
View Article and Find Full Text PDFObjectives: This study aims to assess the performance of a multimodal artificial intelligence (AI) model capable of analyzing both images and textual data (GPT-4V), in interpreting radiological images. It focuses on a range of modalities, anatomical regions, and pathologies to explore the potential of zero-shot generative AI in enhancing diagnostic processes in radiology.
Methods: We analyzed 230 anonymized emergency room diagnostic images, consecutively collected over 1 week, using GPT-4V.
Objective: To develop an automated, new framework based on machine learning to diagnose malignant eyelid skin tumors.
Methods: This study used eyelid lesion images from Sheba Medical Center, a large tertiary center in Israel. Before model training, we pretrained our models on the International Skin Imaging Collaboration (ISIC) 2019 dataset consisting of 25,332 images.
Large language models (LLMs) are transforming the field of natural language processing (NLP). These models offer opportunities for radiologists to make a meaningful impact in their field. NLP is a part of artificial intelligence (AI) that uses computer algorithms to study and understand text data.
View Article and Find Full Text PDF