A comprehensive evaluation of large Language models on benchmark biomedical text processing tasks.

Comput Biol Med

School of Information Technology, York University, Canada; Information Retrieval and Knowledge Management Research Lab, York University, Canada. Electronic address:

Published: March 2024

Recently, Large Language Models (LLMs) have demonstrated impressive capability to solve a wide range of tasks. However, despite their success across various tasks, no prior work has investigated their capability in the biomedical domain yet. To this end, this paper aims to evaluate the performance of LLMs on benchmark biomedical tasks. For this purpose, a comprehensive evaluation of 4 popular LLMs in 6 diverse biomedical tasks across 26 datasets has been conducted. To the best of our knowledge, this is the first work that conducts an extensive evaluation and comparison of various LLMs in the biomedical domain. Interestingly, we find based on our evaluation that in biomedical datasets that have smaller training sets, zero-shot LLMs even outperform the current state-of-the-art models when they were fine-tuned only on the training set of these datasets. This suggests that pre-training on large text corpora makes LLMs quite specialized even in the biomedical domain. We also find that not a single LLM can outperform other LLMs in all tasks, with the performance of different LLMs may vary depending on the task. While their performance is still quite poor in comparison to the biomedical models that were fine-tuned on large training sets, our findings demonstrate that LLMs have the potential to be a valuable tool for various biomedical tasks that lack large annotated data.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.compbiomed.2024.108189DOI Listing

Publication Analysis

Top Keywords

biomedical domain
12
biomedical tasks
12
biomedical
9
llms
9
comprehensive evaluation
8
large language
8
language models
8
benchmark biomedical
8
performance llms
8
training sets
8

Similar Publications

Clinical and genetic spectrum of patients with IRF2BPL syndrome.

J Hum Genet

January 2025

Department of Human Genetics, Graduate School of Medicine, Yokohama City University, Yokohama, Japan.

Interferon regulatory factor 2 binding protein-like (IRF2BPL) is a single-exon gene that is ubiquitously expressed in various tissues, including the brain. IRF2BPL encodes a transcription factor with two zinc-finger domains that potentially downregulate WNT signaling in the nervous system. Pathogenic IRF2BPL variants have been reported to cause developmental delay, seizures, myoclonus epilepsies, autistic spectrum disorder, and other neurodevelopmental disorders.

View Article and Find Full Text PDF

The COVID-19 pandemic caused by SARS-CoV-2 continues to pose a major challenge to global health. Targeting the main protease of the virus (Mpro), which is essential for viral replication and transcription, offers a promising approach for therapeutic intervention. In this study, advanced computational techniques such as molecular docking and molecular dynamics simulations were used to screen a series of antiviral compounds for their potential inhibitory effect on the SARS-CoV-2 Mpro.

View Article and Find Full Text PDF

Assembly of actin-based stereocilia is critical for cochlear hair cells to detect sound. To tune their mechanosensivity, stereocilia form bundles composed of graded rows of ascending height, necessitating the precise control of actin polymerization. Myosin 15 (MYO15A) drives hair bundle development by delivering critical proteins to growing stereocilia that regulate actin polymerization via an unknown mechanism.

View Article and Find Full Text PDF

Personalized Nutrition (PN) represents an approach aimed at delivering tailored dietary recommendations, products or services to support both prevention and treatment of nutrition-related conditions and improve individual health using genetic, phenotypic, medical, nutritional, and other pertinent information. However, current approaches have yielded limited scientific success in improving diets or in mitigating diet-related conditions. In addition, PN currently caters to a specific subgroup of the population rather than having a widespread impact on diet and health at a population level.

View Article and Find Full Text PDF

A comparison of objective and subjective measures of physical activity, sedentary and sleep behaviors between persons with and without depressive symptoms.

J Affect Disord

January 2025

Center for Technology and Behavioral Health, Geisel School of Medicine, Dartmouth College, Lebanon, NH, United States; Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Lebanon, NH, United States; Quantitative Biomedical Sciences Program, Dartmouth College, Lebanon, NH, United States; Department of Psychiatry, Geisel School of Medicine, Dartmouth College, Lebanon, NH, United States.

Background: Major Depressive Disorder (MDD) is characterized by negative recall biases, which may impact how individuals with depressive symptoms report physical activity (PA), sedentary, and sleep behaviors. Additionally, there are discrepancies between subjective and objective behaviors in MDD. Thus, the current study investigated whether individuals with depressive symptoms differ in their subjective and objective PA, sedentary, and sleep behaviors, and whether the magnitude of these discrepancies differ from those in individuals without depressive symptoms.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!