Motivation: With the exponential growth of the life sciences literature, biomedical text mining (BTM) has become an essential technology for accelerating the extraction of insights from publications. The identification of entities in texts, such as diseases or genes, and their normalization, i.e. grounding them in knowledge base, are crucial steps in any BTM pipeline to enable information aggregation from multiple documents. However, tools for these two steps are rarely applied in the same context in which they were developed. Instead, they are applied "in the wild," i.e. on application-dependent text collections from moderately to extremely different from those used for training, varying, e.g. in focus, genre or text type. This raises the question whether the reported performance, usually obtained by training and evaluating on different partitions of the same corpus, can be trusted for downstream applications.

Results: Here, we report on the results of a carefully designed cross-corpus benchmark for entity recognition and normalization, where tools were applied systematically to corpora not used during their training. Based on a survey of 28 published systems, we selected five, based on predefined criteria like feature richness and availability, for an in-depth analysis on three publicly available corpora covering four entity types. Our results present a mixed picture and show that cross-corpus performance is significantly lower than the in-corpus performance. HunFlair2, the redesigned and extended successor of the HunFlair tool, showed the best performance on average, being closely followed by PubTator Central. Our results indicate that users of BTM tools should expect a lower performance than the original published one when applying tools in "the wild" and show that further research is necessary for more robust BTM tools.

Availability And Implementation: All our models are integrated into the Natural Language Processing (NLP) framework flair: https://github.com/flairNLP/flair. Code to reproduce our results is available at: https://github.com/hu-ner/hunflair2-experiments.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11453098PMC
http://dx.doi.org/10.1093/bioinformatics/btae564DOI Listing

Publication Analysis

Top Keywords

entity recognition
8
recognition normalization
8
normalization tools
8
tools
5
performance
5
hunflair2 cross-corpus
4
cross-corpus evaluation
4
evaluation biomedical
4
biomedical named
4
named entity
4

Similar Publications

Objective: Extracting PICO elements-Participants, Intervention, Comparison, and Outcomes-from clinical trial literature is essential for clinical evidence retrieval, appraisal, and synthesis. Existing approaches do not distinguish the attributes of PICO entities. This study aims to develop a named entity recognition (NER) model to extract PICO entities with fine granularities.

View Article and Find Full Text PDF

Double left brachiocephalic vein in a paediatric patient with CHD: a case report.

Indian J Thorac Cardiovasc Surg

February 2025

Department of Paediatric Cardiothoracic Surgery, Sri Satya Sai Sanjeevani Centre for Child Heart Care and Training in Pediatric Cardiac Skills, Atal Nagar- Nava Raipur 492101, Chhattisgarh Atal Nagar-Nava Raipur, India.

Anomalous brachiocephalic vein (ABCV) is a rare entity of head and neck venous channel variations and malformations. Amongst the five subtypes of ABVC, double left brachiocephalic vein (DLBCV) is the rarest. We present the case of a 1-year-11-month-old syndromic child, who had global developmental delay (GDD) with Sprengel deformity and failure to thrive (suspected Klippel Feil phenotype), who presented to us for the cardiac evaluation.

View Article and Find Full Text PDF

Anti-melanoma differentiation-associated protein 5 (anti-MDA5) clinically linked amyopathic dermatomyositis (CADM) is a rare autoimmune condition strongly linked to rapidly progressive interstitial lung disease (RP-ILD), a life-threatening complication. We present a 63-year-old female patient with anti-MDA5-positive CADM, who developed RP-ILD with an imaging pattern consistent with organizing pneumonia. She presented with Gottron's papules, periungual erythema, progressive dyspnea, and anorexia.

View Article and Find Full Text PDF

Introduction And Importance: The branchial or pharyngeal apparatus, crucial in embryological development, consists of clefts, arches, pouches, and membranes. Anomalies arising from this apparatus particularly involving the second branchial arch, are rare. Among these anomalies, complete second branchial cleft fistulas, with both external and internal openings, are exceptionally uncommon.

View Article and Find Full Text PDF

Atypical Femur Fracture in a Teenager on Chronic Imatinib Therapy.

Case Rep Oncol Med

January 2025

Orthopaedic Surgery Department, University of Missouri, Columbia, Missouri, USA.

Atypical femoral fractures (AFFs) are rare fractures usually associated with medications that can ultimately alter bone metabolism. Imatinib, a drug prescribed for treatment of chronic myeloid leukemia (CML), has been associated with altered bone homeostasis, however, with unknown clinical significance. Here, we present the case of a 17-year-old female, with a diagnosis of CML undergoing chronic imatinib therapy, who developed an AFF treated successfully with prophylactic fixation with intramedullary nailing.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!