EXSEQREG: Explaining sequence-based NLP tasks with regions with a case study using morphological features for named entity recognition.

Onur Güngör Tunga Güngör Suzan Uskudarli

PLoS One

Computer Engineering Department, Boğaziçi University, Istanbul, Turkey.

Published: March 2021

The state-of-the-art systems for most natural language engineering tasks employ machine learning methods. Despite the improved performances of these systems, there is a lack of established methods for assessing the quality of their predictions. This work introduces a method for explaining the predictions of any sequence-based natural language processing (NLP) task implemented with any model, neural or non-neural. Our method named EXSEQREG introduces the concept of region that links the prediction and features that are potentially important for the model. A region is a list of positions in the input sentence associated with a single prediction. Many NLP tasks are compatible with the proposed explanation method as regions can be formed according to the nature of the task. The method models the prediction probability differences that are induced by careful removal of features used by the model. The output of the method is a list of importance values. Each value signifies the impact of the corresponding feature on the prediction. The proposed method is demonstrated with a neural network based named entity recognition (NER) tagger using Turkish and Finnish datasets. A qualitative analysis of the explanations is presented. The results are validated with a procedure based on the mutual information score of each feature. We show that this method produces reasonable explanations and may be used for i) assessing the degree of the contribution of features regarding a specific prediction of the model, ii) exploring the features that played a significant role for a trained model when analyzed across the corpus.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7773252	PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0244179	PLOS

Publication Analysis

Top Keywords

nlp tasks

named entity

entity recognition

natural language

features model

method

features

model

prediction

exseqreg explaining

Similar Publications

Biomarkers.

Alzheimers Dement

December 2024

University of Texas, Austin, TX, USA.

Lokesha Pugalenthi Núria Montagut Sonia Karin Marques-Kiderle Camille Wagner Rodriguez Ami Iyer

Background: Primary progressive aphasia (PPA) is a language-based dementia linked with underlying Alzheimer's disease (AD) or frontotemporal dementia. Clinicians often report difficulty differentiating between the logopenic (lv) and nonfluent/agrammatic (nfv) subtypes, as both variants present with disruptions to "fluency" yet for different underlying reasons. In English, acoustic and linguistic markers from connected speech samples have shown promise in machine learning (ML)-based differentiation of nfv from lv.

View Article and Find Full Text PDF

Similar Publications

Technology and Dementia Preconference.

Alzheimers Dement

December 2024

University of Texas, Austin, TX, USA.

Lokesha Pugalenthi Núria Montagut Sonia Karin Marques-Kiderle Camille Wagner Rodriguez Ami Iyer

View Article and Find Full Text PDF

Similar Publications

Hybrid natural language processing tool for semantic annotation of medical texts in Spanish.

BMC Bioinformatics

January 2025

Centro de Salud Retiro, Hospital Universitario Gregorio Marañon, C/Lope de Rueda, 43, 28009, Madrid, Spain.

Leonardo Campillos-Llanos Ana Valverde-Mateos Adrián Capllonch-Carrión

Background: Natural language processing (NLP) enables the extraction of information embedded within unstructured texts, such as clinical case reports and trial eligibility criteria. By identifying relevant medical concepts, NLP facilitates the generation of structured and actionable data, supporting complex tasks like cohort identification and the analysis of clinical records. To accomplish those tasks, we introduce a deep learning-based and lexicon-based named entity recognition (NER) tool for texts in Spanish.

View Article and Find Full Text PDF

Similar Publications

A Topology-Enhanced Multi-Viewed Contrastive Approach for Molecular Graph Representation Learning and Classification.

Mol Inform

January 2025

Faculty of Information Technology, HUTECH University, 700000, Ho Chi Minh City, Vietnam.

Phu Pham

In recent times, graph representation learning has been becoming a hot research topic which has attracted a lot of attention from researchers. Graph embeddings have diverse applications across fields such as information and social network analysis, bioinformatics and cheminformatics, natural language processing (NLP), and recommendation systems. Among the advanced deep learning (DL) based architectures used in graph representation learning, graph neural networks (GNNs) have emerged as the dominant and highly effective framework.

View Article and Find Full Text PDF

Similar Publications

BetaAlign: a deep learning approach for multiple sequence alignment.

Bioinformatics

January 2025

The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel.

Edo Dotan Elya Wygoda Noa Ecker Michael Alburquerque Oren Avram

Article Synopsis

The study explores a novel method for multiple sequence alignments in bioinformatics using natural language processing (NLP) techniques.
Researchers developed BetaAlign, a deep learning aligner that outperforms traditional alignment algorithms and offers highly accurate results by leveraging transformer models.
The findings highlight the potential of AI-based approaches to improve alignment tasks and advance phylogenomics, with training data and tools made available through Hugging Face.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!