BELHD: improving biomedical entity linking with homonym disambiguation.

Bioinformatics

Computer Science, Humboldt-Universität zu Berlin, Berlin 12489, Germany.

Published: August 2024

Motivation: Biomedical entity linking (BEL) is the task of grounding entity mentions to a given knowledge base (KB). Recently, neural name-based methods, system identifying the most appropriate name in the KB for a given mention using neural network (either via dense retrieval or autoregressive modeling), achieved remarkable results for the task, without requiring manual tuning or definition of domain/entity-specific rules. However, as name-based methods directly return KB names, they cannot cope with homonyms, i.e. different KB entities sharing the exact same name. This significantly affects their performance for KBs where homonyms account for a large amount of entity mentions (e.g. UMLS and NCBI Gene).

Results: We present BELHD (Biomedical Entity Linking with Homonym Disambiguation), a new name-based method that copes with this challenge. BELHD builds upon the BioSyn model with two crucial extensions. First, it performs pre-processing of the KB, during which it expands homonyms with a specifically constructed disambiguating string, thus enforcing unique linking decisions. Second, it introduces candidate sharing, a novel strategy that strengthens the overall training signal by including similar mentions from the same document as positive or negative examples, according to their corresponding KB identifier. Experiments with 10 corpora and 5 entity types show that BELHD improves upon current neural state-of-the-art approaches, achieving the best results in 6 out of 10 corpora with an average improvement of 4.55pp recall@1. Furthermore, the KB preprocessing is orthogonal to the prediction model and thus can also improve other neural methods, which we exemplify for GenBioEL, a generative name-based BEL approach.

Availability And Implementation: The code to reproduce our experiments can be found at: https://github.com/sg-wbi/belhd.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11310454	PMC
http://dx.doi.org/10.1093/bioinformatics/btae474	DOI Listing

Publication Analysis

Top Keywords

biomedical entity

entity linking

linking homonym

homonym disambiguation

entity mentions

name-based methods

entity

belhd

belhd improving

improving biomedical

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!

A PHP Error was encountered