Due to the increasing use of information technologies by biomedical experts, researchers, public health agencies, and healthcare professionals, a large number of scientific literatures, clinical notes, and other structured and unstructured text resources are rapidly increasing and being stored in various data sources like PubMed. These massive text resources can be leveraged to extract valuable knowledge and insights using machine learning techniques. Recent advancement in neural network-based classification models has gained popularity which takes numeric vectors () of training data as the input to train classification models. Better the input vectors, more accurate would be the classification. Word representations are learned as the distribution of words in an embedding space, wherein each word has its vector and the semantically similar words based on the contexts appear nearby each other. However, such distributional word representations are incapable of encapsulating relational semantics between distant words. In the biomedical domain, is a well-studied problem which aims to extract relational words, which associates distant entities generally representing the subject and object of a sentence. Our goal is to capture the relational semantics information between distant words from a large corpus to learn enhanced word representation and employ the learned word representation for various natural language processing tasks such as text classification. In this article, we have proposed an application of biomedical relation triplets to learn word representation through incorporating relational semantic information within the distributional representation of words. In other words, the proposed approach aims to capture both distributional and relational contexts of the words to learn their numeric vectors from text corpus. We have also proposed an application of the learned word representations for text classification. The proposed approach is evaluated over multiple benchmark datasets, and the efficacy of the learned word representations is tested in terms of and tasks. Our proposed approach provides better performance in comparison to the state-of-the-art GloVe model. Furthermore, we have applied the learned word representations to classify biomedical texts using four neural network-based classification models, and the classification accuracy further confirms the effectiveness of the learned word representations by our proposed approach.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11390191PMC
http://dx.doi.org/10.1155/2023/2989791DOI Listing

Publication Analysis

Top Keywords

word representations
24
learned word
20
word representation
16
proposed approach
16
text classification
12
classification models
12
word
11
classification
8
distributional relational
8
relational contexts
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!