OpenMedLM: prompt engineering can out-perform fine-tuning in medical question-answering with open-source large language models.

Jenish Maharjan Anurag Garikipati Navan Preet Singh Leo Cyrus Mayank Sharma Madalina Ciobanu Gina Barnes Rahul Thapa Qingqing Mao Ritankar Das

Sci Rep

Montera, Inc. Dba Forta, 548 Market St., PMB 89605, San Francisco, CA, 94104-5401, USA.

Published: June 2024

LLMs can accomplish specialized medical knowledge tasks, however, equitable access is hindered by the extensive fine-tuning, specialized medical data requirement, and limited access to proprietary models. Open-source (OS) medical LLMs show performance improvements and provide the transparency and compliance required in healthcare. We present OpenMedLM, a prompting platform delivering state-of-the-art (SOTA) performance for OS LLMs on medical benchmarks. We evaluated OS foundation LLMs (7B-70B) on medical benchmarks (MedQA, MedMCQA, PubMedQA, MMLU medical-subset) and selected Yi34B for developing OpenMedLM. Prompting strategies included zero-shot, few-shot, chain-of-thought, and ensemble/self-consistency voting. OpenMedLM delivered OS SOTA results on three medical LLM benchmarks, surpassing previous best-performing OS models that leveraged costly and extensive fine-tuning. OpenMedLM displays the first results to date demonstrating the ability of OS foundation models to optimize performance, absent specialized fine-tuning. The model achieved 72.6% accuracy on MedQA, outperforming the previous SOTA by 2.4%, and 81.7% accuracy on MMLU medical-subset, establishing itself as the first OS LLM to surpass 80% accuracy on this benchmark. Our results highlight medical-specific emergent properties in OS LLMs not documented elsewhere to date and validate the ability of OS models to accomplish healthcare tasks, highlighting the benefits of prompt engineering to improve performance of accessible LLMs for medical applications.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11187169	PMC
http://dx.doi.org/10.1038/s41598-024-64827-6	DOI Listing

Publication Analysis

Top Keywords

prompt engineering

medical

specialized medical

extensive fine-tuning

openmedlm prompting

llms medical

medical benchmarks

mmlu medical-subset

llms

openmedlm

Similar Publications

Clinical Decision Support Using Speech Signal Analysis: Systematic Scoping Review of Neurological Disorders.

J Med Internet Res

January 2025

Knight Foundation of Computing & Information Sciences, Florida International University, Miami, FL, United States.

Upeka De Silva Samaneh Madanian Sharon Olsen John Michael Templeton Christian Poellabauer

Background: Digital biomarkers are increasingly used in clinical decision support for various health conditions. Speech features as digital biomarkers can offer insights into underlying physiological processes due to the complexity of speech production. This process involves respiration, phonation, articulation, and resonance, all of which rely on specific motor systems for the preparation and execution of speech.

View Article and Find Full Text PDF

Similar Publications

Association of Adverse Perinatal Outcomes with Blood Components Transfusion in Patients with Acute Fatty Liver of Pregnancy.

Int J Womens Health

January 2025

Department of Obstetrics and Gynecology, Qilu Hospital of Shandong University, Jinan, Shandong Province, People's Republic of China.

Xiyu Pan Ran Chu Xu Qiao Xianru Zhang Li Li

Purpose: To investigate the rare obstetric emergency with no specific treatments called acute fatty liver of pregnancy. The primary objective was to evaluate association of adverse perinatal outcomes with blood components transfusion. While the secondary objective focused on further establishing the predictive risk factors for adverse perinatal outcomes.

View Article and Find Full Text PDF

Similar Publications

Demystifying Large Language Models for Medicine: A Primer.

ArXiv

November 2024

Qiao Jin Nicholas Wan Robert Leaman Shubo Tian Zhizheng Wang

Large language models (LLMs) represent a transformative class of AI tools capable of revolutionizing various aspects of healthcare by generating human-like responses across diverse contexts and adapting to novel tasks following human instructions. Their potential application spans a broad range of medical tasks, such as clinical documentation, matching patients to clinical trials, and answering medical questions. In this primer paper, we propose an actionable guideline to help healthcare professionals more efficiently utilize LLMs in their work, along with a set of best practices.

View Article and Find Full Text PDF

Similar Publications

Toward Automated Simulation Research Workflow through LLM Prompt Engineering Design.

J Chem Inf Model

January 2025

The State Key Laboratory of Molecular Engineering of Polymers, The Research Center of AI for Polymer Science Department of Macromolecular Science, Fudan University, Shanghai 200433, China.

Zhihan Liu Yubo Chai Jianfeng Li

The advent of Large Language Models (LLMs) has created new opportunities for the automation of scientific research spanning both experimental processes and computational simulations. This study explores the feasibility of constructing an autonomous simulation agent (ASA) powered by LLMs through prompt engineering and automated program design to automate the entire simulation research process according to a human-provided research plan. This process includes experimental design, remote upload and simulation execution, data analysis, and report compilation.

View Article and Find Full Text PDF

Similar Publications

Post-synthesis surface modification of Cu/Zr metal azolate framework: A pathway to highly sensitive electrochemical biosensors for atrazine detection.

Anal Chim Acta

February 2025

Dept. of Electronic Materials Engineering, Kwangwoon University, Seoul, 01897, Republic of Korea. Electronic address:

Bhavna Hedau Tae-Jun Ha

Background: Atrazine (ATZ), a pesticide that poses serious health problems, is observed in the environment, thereby prompting its periodic monitoring and control using functional biosensors. However, established methods for ATZ detection have limited applicability. Two-dimensional (2D) metal azolate frameworks (MAF) have a higher surface area per unit volume and provide easier access to active sites.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!