Objectives: This study aims to evaluate the utility of large language models (LLMs) in healthcare, focusing on their applications in enhancing patient care through improved diagnostic, decision-making processes, and as ancillary tools for healthcare professionals.
Materials And Methods: We evaluated ChatGPT, GPT-4, and LLaMA in identifying patients with specific diseases using gold-labeled Electronic Health Records (EHRs) from the MIMIC-III database, covering three prevalent diseases-Chronic Obstructive Pulmonary Disease (COPD), Chronic Kidney Disease (CKD)-along with the rare condition, Primary Biliary Cirrhosis (PBC), and the hard-to-diagnose condition Cancer Cachexia.
Results: In patient identification, GPT-4 had near similar or better performance compared to the corresponding disease-specific Machine Learning models (F1-score ≥ 85%) on COPD, CKD, and PBC.