Evaluating sentence representations for biomedical text: Methods and experimental results.

J Biomed Inform

Department of Information and Computing Sciences, Utrecht University, 3584 CC Utrecht, the Netherlands. Electronic address:

Published: April 2020

Text representations ar one of the main inputs to various Natural Language Processing (NLP) methods. Given the fast developmental pace of new sentence embedding methods, we argue that there is a need for a unified methodology to assess these different techniques in the biomedical domain. This work introduces a comprehensive evaluation of novel methods across ten medical classification tasks. The tasks cover a variety of BioNLP problems such as semantic similarity, question answering, citation sentiment analysis and others with binary and multi-class datasets. Our goal is to assess the transferability of different sentence representation schemes to the medical and clinical domain. Our analysis shows that embeddings based on Language Models which account for the context-dependent nature of words, usually outperform others in terms of performance. Nonetheless, there is no single embedding model that perfectly represents biomedical and clinical texts with consistent performance across all tasks. This illustrates the need for a more suitable bio-encoder. Our MedSentEval source code, pre-trained embeddings and examples have been made available on GitHub.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jbi.2020.103396DOI Listing

Publication Analysis

Top Keywords

evaluating sentence
4
sentence representations
4
representations biomedical
4
biomedical text
4
methods
4
text methods
4
methods experimental
4
experimental text
4
text representations
4
representations main
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!