Towards reliable named entity recognition in the biomedical domain.

Bioinformatics

Department of Computer Science, University of Toronto, Toronto, ON M5S 3G4, Canada.

Published: January 2020

AI Article Synopsis

  • Automatic biomedical named entity recognition (BioNER) is crucial for extracting information in the biomedical field and has been traditionally dominated by machine learning methods like conditional random fields (CRFs) and more recently deep learning models.
  • Recent research shows that the former methods may not perform well on different datasets, and the popular BiLSTM-CRF model also struggles with generalization to new data.
  • To enhance performance, this study implements three strategies: variational dropout, transfer learning, and multi-task learning, significantly boosting out-of-corpus performance, with the best combination yielding a 10.75% improvement, and provides a new open-source tool called Saber for implementing these models.

Article Abstract

Motivation: Automatic biomedical named entity recognition (BioNER) is a key task in biomedical information extraction. For some time, state-of-the-art BioNER has been dominated by machine learning methods, particularly conditional random fields (CRFs), with a recent focus on deep learning. However, recent work has suggested that the high performance of CRFs for BioNER may not generalize to corpora other than the one it was trained on. In our analysis, we find that a popular deep learning-based approach to BioNER, known as bidirectional long short-term memory network-conditional random field (BiLSTM-CRF), is correspondingly poor at generalizing. To address this, we evaluate three modifications of BiLSTM-CRF for BioNER to improve generalization: improved regularization via variational dropout, transfer learning and multi-task learning.

Results: We measure the effect that each strategy has when training/testing on the same corpus ('in-corpus' performance) and when training on one corpus and evaluating on another ('out-of-corpus' performance), our measure of the model's ability to generalize. We found that variational dropout improves out-of-corpus performance by an average of 4.62%, transfer learning by 6.48% and multi-task learning by 8.42%. The maximal increase we identified combines multi-task learning and variational dropout, which boosts out-of-corpus performance by 10.75%. Furthermore, we make available a new open-source tool, called Saber that implements our best BioNER models.

Availability And Implementation: Source code for our biomedical IE tool is available at https://github.com/BaderLab/saber. Corpora and other resources used in this study are available at https://github.com/BaderLab/Towards-reliable-BioNER.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6956779PMC
http://dx.doi.org/10.1093/bioinformatics/btz504DOI Listing

Publication Analysis

Top Keywords

variational dropout
12
named entity
8
entity recognition
8
transfer learning
8
out-of-corpus performance
8
multi-task learning
8
bioner
6
learning
6
performance
5
reliable named
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!