Phenotype Extraction Based on Word Embedding to Sentence Embedding Cascaded Approach.

Wenhui Xing Xiaohui Yuan Lin Li Lun Hu Jing Peng

IEEE Trans Nanobioscience

Published: July 2018

As a significant determinant in the development of named entity recognition, phenotypic descriptions are normally presented differently in biomedical literature with the use of complicated semantics. In this paper, a novel approach has been proposed to identify plant phenotypes by adopting word embedding to sentence embedding cascaded approach. We make use of a word embedding method to find high-frequency phenotypes with original sentences used as input in a sentence embedding method. In doing so, a variety of complicated phenotypic expressions can be recognized accurately. Besides, the state-of-the-art word representation models have been compared and among them, skip-gram with negative sampling was selected with the best performance. To evaluate the performance of our approach, we applied it to the dataset composed of 56 748 PubMed abstracts of model organism Arabidopsis thaliana. The experiment results showed that our approach yielded the best performance, as it achieved a 2.588-fold increase in terms of the number of new phenotypic descriptions when compared to the original phenotype ontology.

Download full-text PDF	Source
http://dx.doi.org/10.1109/TNB.2018.2838137	DOI Listing

Publication Analysis

Top Keywords

word embedding

sentence embedding

embedding sentence

embedding cascaded

cascaded approach

phenotypic descriptions

embedding method

best performance

embedding

approach

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!