LLMs can accomplish specialized medical knowledge tasks, however, equitable access is hindered by the extensive fine-tuning, specialized medical data requirement, and limited access to proprietary models. Open-source (OS) medical LLMs show performance improvements and provide the transparency and compliance required in healthcare. We present OpenMedLM, a prompting platform delivering state-of-the-art (SOTA) performance for OS LLMs on medical benchmarks. We evaluated OS foundation LLMs (7B-70B) on medical benchmarks (MedQA, MedMCQA, PubMedQA, MMLU medical-subset) and selected Yi34B for developing OpenMedLM. Prompting strategies included zero-shot, few-shot, chain-of-thought, and ensemble/self-consistency voting. OpenMedLM delivered OS SOTA results on three medical LLM benchmarks, surpassing previous best-performing OS models that leveraged costly and extensive fine-tuning. OpenMedLM displays the first results to date demonstrating the ability of OS foundation models to optimize performance, absent specialized fine-tuning. The model achieved 72.6% accuracy on MedQA, outperforming the previous SOTA by 2.4%, and 81.7% accuracy on MMLU medical-subset, establishing itself as the first OS LLM to surpass 80% accuracy on this benchmark. Our results highlight medical-specific emergent properties in OS LLMs not documented elsewhere to date and validate the ability of OS models to accomplish healthcare tasks, highlighting the benefits of prompt engineering to improve performance of accessible LLMs for medical applications.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11187169PMC
http://dx.doi.org/10.1038/s41598-024-64827-6DOI Listing

Publication Analysis

Top Keywords

prompt engineering
8
medical
8
specialized medical
8
extensive fine-tuning
8
openmedlm prompting
8
llms medical
8
medical benchmarks
8
mmlu medical-subset
8
llms
6
openmedlm
5

Similar Publications

Background: Digital biomarkers are increasingly used in clinical decision support for various health conditions. Speech features as digital biomarkers can offer insights into underlying physiological processes due to the complexity of speech production. This process involves respiration, phonation, articulation, and resonance, all of which rely on specific motor systems for the preparation and execution of speech.

View Article and Find Full Text PDF

Association of Adverse Perinatal Outcomes with Blood Components Transfusion in Patients with Acute Fatty Liver of Pregnancy.

Int J Womens Health

January 2025

Department of Obstetrics and Gynecology, Qilu Hospital of Shandong University, Jinan, Shandong Province, People's Republic of China.

Purpose: To investigate the rare obstetric emergency with no specific treatments called acute fatty liver of pregnancy. The primary objective was to evaluate association of adverse perinatal outcomes with blood components transfusion. While the secondary objective focused on further establishing the predictive risk factors for adverse perinatal outcomes.

View Article and Find Full Text PDF

Large language models (LLMs) represent a transformative class of AI tools capable of revolutionizing various aspects of healthcare by generating human-like responses across diverse contexts and adapting to novel tasks following human instructions. Their potential application spans a broad range of medical tasks, such as clinical documentation, matching patients to clinical trials, and answering medical questions. In this primer paper, we propose an actionable guideline to help healthcare professionals more efficiently utilize LLMs in their work, along with a set of best practices.

View Article and Find Full Text PDF

Toward Automated Simulation Research Workflow through LLM Prompt Engineering Design.

J Chem Inf Model

January 2025

The State Key Laboratory of Molecular Engineering of Polymers, The Research Center of AI for Polymer Science Department of Macromolecular Science, Fudan University, Shanghai 200433, China.

The advent of Large Language Models (LLMs) has created new opportunities for the automation of scientific research spanning both experimental processes and computational simulations. This study explores the feasibility of constructing an autonomous simulation agent (ASA) powered by LLMs through prompt engineering and automated program design to automate the entire simulation research process according to a human-provided research plan. This process includes experimental design, remote upload and simulation execution, data analysis, and report compilation.

View Article and Find Full Text PDF

Post-synthesis surface modification of Cu/Zr metal azolate framework: A pathway to highly sensitive electrochemical biosensors for atrazine detection.

Anal Chim Acta

February 2025

Dept. of Electronic Materials Engineering, Kwangwoon University, Seoul, 01897, Republic of Korea. Electronic address:

Background: Atrazine (ATZ), a pesticide that poses serious health problems, is observed in the environment, thereby prompting its periodic monitoring and control using functional biosensors. However, established methods for ATZ detection have limited applicability. Two-dimensional (2D) metal azolate frameworks (MAF) have a higher surface area per unit volume and provide easier access to active sites.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!