We are not ready yet: limitations of state-of-the-art disease named entity recognizers.

J Biomed Semantics

ZB MED - Information Centre for Life Sciences, Gleueler Str. 60, Cologne, Germany.

Published: October 2022

Background: Intense research has been done in the area of biomedical natural language processing. Since the breakthrough of transfer learning-based methods, BERT models are used in a variety of biomedical and clinical applications. For the available data sets, these models show excellent results - partly exceeding the inter-annotator agreements. However, biomedical named entity recognition applied on COVID-19 preprints shows a performance drop compared to the results on test data. The question arises how well trained models are able to predict on completely new data, i.e. to generalize.

Results: Based on the example of disease named entity recognition, we investigate the robustness of different machine learning-based methods - thereof transfer learning - and show that current state-of-the-art methods work well for a given training and the corresponding test set but experience a significant lack of generalization when applying to new data.

Conclusions: We argue that there is a need for larger annotated data sets for training and testing. Therefore, we foresee the curation of further data sets and, moreover, the investigation of continual learning processes for machine learning-based models.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9612606	PMC
http://dx.doi.org/10.1186/s13326-022-00280-6	DOI Listing

Publication Analysis

Top Keywords

named entity

data sets

disease named

learning-based methods

entity recognition

machine learning-based

data

ready limitations

limitations state-of-the-art

state-of-the-art disease

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!