DR.BENCH: Diagnostic Reasoning Benchmark for Clinical Natural Language Processing.

J Biomed Inform

ICU Data Science Lab, Department of Medicine, University of Wisconsin Madison, 1685 Highland Ave, Madison, 53792, WI, USA.

Published: February 2023

The meaningful use of electronic health records (EHR) continues to progress in the digital era with clinical decision support systems augmented by artificial intelligence. A priority in improving provider experience is to overcome information overload and reduce the cognitive burden so fewer medical errors and cognitive biases are introduced during patient care. One major type of medical error is diagnostic error due to systematic or predictable errors in judgement that rely on heuristics. The potential for clinical natural language processing (cNLP) to model diagnostic reasoning in humans with forward reasoning from data to diagnosis and potentially reduce cognitive burden and medical error has not been investigated. Existing tasks to advance the science in cNLP have largely focused on information extraction and named entity recognition through classification tasks. We introduce a novel suite of tasks coined as Diagnostic Reasoning Benchmarks, Dr.Bench, as a new benchmark for developing and evaluating cNLP models with clinical diagnostic reasoning ability. The suite includes six tasks from ten publicly available datasets addressing clinical text understanding, medical knowledge reasoning, and diagnosis generation. DR.BENCH is the first clinical suite of tasks designed to be a natural language generation framework to evaluate pre-trained language models for diagnostic reasoning. The goal of DR. BENCH is to advance the science in cNLP to support downstream applications in computerized diagnostic decision support and improve the efficiency and accuracy of healthcare providers during patient care. We fine-tune and evaluate the state-of-the-art generative models on DR.BENCH. Experiments show that with domain adaptation pre-training on medical knowledge, the model demonstrated opportunities for improvement when evaluated in DR. BENCH. We share DR. BENCH as a publicly available GitLab repository with a systematic approach to load and evaluate models for the cNLP community. We also discuss the carbon footprint produced during the experiments and encourage future work on DR.BENCH to report the carbon footprint.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9993808PMC
http://dx.doi.org/10.1016/j.jbi.2023.104286DOI Listing

Publication Analysis

Top Keywords

diagnostic reasoning
20
natural language
12
clinical natural
8
language processing
8
decision support
8
reduce cognitive
8
cognitive burden
8
patient care
8
medical error
8
advance science
8

Similar Publications

Objective: Spinal fusion is a commonly performed surgical procedure used to relieve pain, deformity, and instability of various spinal pathologies. Although there have been attempts to standardize spinal fusion assessment radiologically, there is currently no unified definition that also considers clinical symptomology. This review attempts to create a more holistic and standardized definition of spinal fusion.

View Article and Find Full Text PDF

Word problems are essential for math learning and education, bridging numerical knowledge with real-world applications. Despite their importance, the neural mechanisms underlying word problem solving, especially in children, remain poorly understood. Here, we examine children's cognitive and brain response profiles for arithmetic word problems (AWPs), which involve one-step mathematical operations, and compare them with nonarithmetic word problems (NWPs), structured as parallel narratives without numerical operations.

View Article and Find Full Text PDF

Evidence-Based Aeromedical Assessments.

Aerosp Med Hum Perform

January 2025

Introduction: Assessment of fitness for flight constitutes one of the core tasks of aeromedical professionals. The value of such evaluations depends on the decision to be based on complete medical information, valid risk methodology, and genuine flight safety indicators. To achieve these goals, the aeromedical practitioner should ensure an evidence-based approach.

View Article and Find Full Text PDF

Testing the Purity of Cultures After Axenicity Treatments.

Cells

January 2025

Department of Functional and Evolutionary Ecology, University of Vienna, Djerassiplatz 1, A-1030 Vienna, Austria.

Contaminations are challenging for monocultures, as they impact the culture conditions and thus influence the growth of the target organism and the overall biomass composition. In phycology, axenic cultures comprising a single living species are commonly strived for both basic research and industrial applications, because contaminants reduce significance for analytic purposes and interfere with the safety and quality of commercial products. We aimed to establish axenic cultures of , known as the food additive "Spirulina".

View Article and Find Full Text PDF

The Potential Clinical Utility of the Customized Large Language Model in Gastroenterology: A Pilot Study.

Bioengineering (Basel)

December 2024

College of Liberal Arts Faculty of Basic Liberal Art, Hansung University, Seoul 02876, Republic of Korea.

The large language model (LLM) has the potential to be applied to clinical practice. However, there has been scarce study on this in the field of gastroenterology. Aim: This study explores the potential clinical utility of two LLMs in the field of gastroenterology: a customized GPT model and a conventional GPT-4o, an advanced LLM capable of retrieval-augmented generation (RAG).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!