LLMs can accomplish specialized medical knowledge tasks, however, equitable access is hindered by the extensive fine-tuning, specialized medical data requirement, and limited access to proprietary models. Open-source (OS) medical LLMs show performance improvements and provide the transparency and compliance required in healthcare. We present OpenMedLM, a prompting platform delivering state-of-the-art (SOTA) performance for OS LLMs on medical benchmarks. We evaluated OS foundation LLMs (7B-70B) on medical benchmarks (MedQA, MedMCQA, PubMedQA, MMLU medical-subset) and selected Yi34B for developing OpenMedLM. Prompting strategies included zero-shot, few-shot, chain-of-thought, and ensemble/self-consistency voting. OpenMedLM delivered OS SOTA results on three medical LLM benchmarks, surpassing previous best-performing OS models that leveraged costly and extensive fine-tuning. OpenMedLM displays the first results to date demonstrating the ability of OS foundation models to optimize performance, absent specialized fine-tuning. The model achieved 72.6% accuracy on MedQA, outperforming the previous SOTA by 2.4%, and 81.7% accuracy on MMLU medical-subset, establishing itself as the first OS LLM to surpass 80% accuracy on this benchmark. Our results highlight medical-specific emergent properties in OS LLMs not documented elsewhere to date and validate the ability of OS models to accomplish healthcare tasks, highlighting the benefits of prompt engineering to improve performance of accessible LLMs for medical applications.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11187169 | PMC |
http://dx.doi.org/10.1038/s41598-024-64827-6 | DOI Listing |
J Med Internet Res
January 2025
Knight Foundation of Computing & Information Sciences, Florida International University, Miami, FL, United States.
Background: Digital biomarkers are increasingly used in clinical decision support for various health conditions. Speech features as digital biomarkers can offer insights into underlying physiological processes due to the complexity of speech production. This process involves respiration, phonation, articulation, and resonance, all of which rely on specific motor systems for the preparation and execution of speech.
View Article and Find Full Text PDFInt J Womens Health
January 2025
Department of Obstetrics and Gynecology, Qilu Hospital of Shandong University, Jinan, Shandong Province, People's Republic of China.
Purpose: To investigate the rare obstetric emergency with no specific treatments called acute fatty liver of pregnancy. The primary objective was to evaluate association of adverse perinatal outcomes with blood components transfusion. While the secondary objective focused on further establishing the predictive risk factors for adverse perinatal outcomes.
View Article and Find Full Text PDFLarge language models (LLMs) represent a transformative class of AI tools capable of revolutionizing various aspects of healthcare by generating human-like responses across diverse contexts and adapting to novel tasks following human instructions. Their potential application spans a broad range of medical tasks, such as clinical documentation, matching patients to clinical trials, and answering medical questions. In this primer paper, we propose an actionable guideline to help healthcare professionals more efficiently utilize LLMs in their work, along with a set of best practices.
View Article and Find Full Text PDFJ Chem Inf Model
January 2025
The State Key Laboratory of Molecular Engineering of Polymers, The Research Center of AI for Polymer Science Department of Macromolecular Science, Fudan University, Shanghai 200433, China.
The advent of Large Language Models (LLMs) has created new opportunities for the automation of scientific research spanning both experimental processes and computational simulations. This study explores the feasibility of constructing an autonomous simulation agent (ASA) powered by LLMs through prompt engineering and automated program design to automate the entire simulation research process according to a human-provided research plan. This process includes experimental design, remote upload and simulation execution, data analysis, and report compilation.
View Article and Find Full Text PDFAnal Chim Acta
February 2025
Dept. of Electronic Materials Engineering, Kwangwoon University, Seoul, 01897, Republic of Korea. Electronic address:
Background: Atrazine (ATZ), a pesticide that poses serious health problems, is observed in the environment, thereby prompting its periodic monitoring and control using functional biosensors. However, established methods for ATZ detection have limited applicability. Two-dimensional (2D) metal azolate frameworks (MAF) have a higher surface area per unit volume and provide easier access to active sites.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!