The state-of-the-art systems for most natural language engineering tasks employ machine learning methods. Despite the improved performances of these systems, there is a lack of established methods for assessing the quality of their predictions. This work introduces a method for explaining the predictions of any sequence-based natural language processing (NLP) task implemented with any model, neural or non-neural. Our method named EXSEQREG introduces the concept of region that links the prediction and features that are potentially important for the model. A region is a list of positions in the input sentence associated with a single prediction. Many NLP tasks are compatible with the proposed explanation method as regions can be formed according to the nature of the task. The method models the prediction probability differences that are induced by careful removal of features used by the model. The output of the method is a list of importance values. Each value signifies the impact of the corresponding feature on the prediction. The proposed method is demonstrated with a neural network based named entity recognition (NER) tagger using Turkish and Finnish datasets. A qualitative analysis of the explanations is presented. The results are validated with a procedure based on the mutual information score of each feature. We show that this method produces reasonable explanations and may be used for i) assessing the degree of the contribution of features regarding a specific prediction of the model, ii) exploring the features that played a significant role for a trained model when analyzed across the corpus.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7773252PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0244179PLOS

Publication Analysis

Top Keywords

nlp tasks
8
named entity
8
entity recognition
8
natural language
8
features model
8
method
7
features
5
model
5
prediction
5
exseqreg explaining
4

Similar Publications

Background: Primary progressive aphasia (PPA) is a language-based dementia linked with underlying Alzheimer's disease (AD) or frontotemporal dementia. Clinicians often report difficulty differentiating between the logopenic (lv) and nonfluent/agrammatic (nfv) subtypes, as both variants present with disruptions to "fluency" yet for different underlying reasons. In English, acoustic and linguistic markers from connected speech samples have shown promise in machine learning (ML)-based differentiation of nfv from lv.

View Article and Find Full Text PDF

Background: Primary progressive aphasia (PPA) is a language-based dementia linked with underlying Alzheimer's disease (AD) or frontotemporal dementia. Clinicians often report difficulty differentiating between the logopenic (lv) and nonfluent/agrammatic (nfv) subtypes, as both variants present with disruptions to "fluency" yet for different underlying reasons. In English, acoustic and linguistic markers from connected speech samples have shown promise in machine learning (ML)-based differentiation of nfv from lv.

View Article and Find Full Text PDF

Background: Natural language processing (NLP) enables the extraction of information embedded within unstructured texts, such as clinical case reports and trial eligibility criteria. By identifying relevant medical concepts, NLP facilitates the generation of structured and actionable data, supporting complex tasks like cohort identification and the analysis of clinical records. To accomplish those tasks, we introduce a deep learning-based and lexicon-based named entity recognition (NER) tool for texts in Spanish.

View Article and Find Full Text PDF

A Topology-Enhanced Multi-Viewed Contrastive Approach for Molecular Graph Representation Learning and Classification.

Mol Inform

January 2025

Faculty of Information Technology, HUTECH University, 700000, Ho Chi Minh City, Vietnam.

In recent times, graph representation learning has been becoming a hot research topic which has attracted a lot of attention from researchers. Graph embeddings have diverse applications across fields such as information and social network analysis, bioinformatics and cheminformatics, natural language processing (NLP), and recommendation systems. Among the advanced deep learning (DL) based architectures used in graph representation learning, graph neural networks (GNNs) have emerged as the dominant and highly effective framework.

View Article and Find Full Text PDF

BetaAlign: a deep learning approach for multiple sequence alignment.

Bioinformatics

January 2025

The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel.

Article Synopsis
  • The study explores a novel method for multiple sequence alignments in bioinformatics using natural language processing (NLP) techniques.
  • Researchers developed BetaAlign, a deep learning aligner that outperforms traditional alignment algorithms and offers highly accurate results by leveraging transformer models.
  • The findings highlight the potential of AI-based approaches to improve alignment tasks and advance phylogenomics, with training data and tools made available through Hugging Face.
View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!