Knowledge-Augmented Visual Question Answering With Natural Language Explanation.

Jiayuan Xie Yi Cai Jiali Chen Ruohang Xu Jiexin Wang Qing Li

IEEE Trans Image Process

Published: April 2024

Visual question answering with natural language explanation (VQA-NLE) is a challenging task that requires models to not only generate accurate answers but also to provide explanations that justify the relevant decision-making processes. This task is accomplished by generating natural language sentences based on the given question-image pair. However, existing methods often struggle to ensure consistency between the answers and explanations due to their disregard of the crucial interactions between these factors. Moreover, existing methods overlook the potential benefits of incorporating additional knowledge, which hinders their ability to effectively bridge the semantic gap between questions and images, leading to less accurate explanations. In this paper, we present a novel approach denoted the knowledge-based iterative consensus VQA-NLE (KICNLE) model to address these limitations. To maintain consistency, our model incorporates an iterative consensus generator that adopts a multi-iteration generative method, enabling multiple iterations of the answer and explanation in each generation. In each iteration, the current answer is utilized to generate an explanation, which in turn guides the generation of a new answer. Additionally, a knowledge retrieval module is introduced to provide potentially valid candidate knowledge, guide the generation process, effectively bridge the gap between questions and images, and enable the production of high-quality answer-explanation pairs. Extensive experiments conducted on three different datasets demonstrate the superiority of our proposed KICNLE model over competing state-of-the-art approaches. Our code is available at https://github.com/Gary-code/KICNLE.

Download full-text PDF	Source
http://dx.doi.org/10.1109/TIP.2024.3379900	DOI Listing

Publication Analysis

Top Keywords

natural language

visual question

question answering

answering natural

language explanation

existing methods

effectively bridge

gap questions

questions images

iterative consensus

Similar Publications

Unraveling the impact of congenital deafness on individual brain organization.

Elife

March 2025

Department of Neuroscience, Georgetown University Medical Center, Washington DC, United States.

Lenia Amaral Xiaosha Wang Yanchao Bi Ella Striem-Amit

Research on brain plasticity, particularly in the context of deafness, consistently emphasizes the reorganization of the auditory cortex. But to what extent do all individuals with deafness show the same level of reorganization? To address this question, we examined the individual differences in functional connectivity (FC) from the deprived auditory cortex. Our findings demonstrate remarkable differentiation between individuals deriving from the absence of shared auditory experiences, resulting in heightened FC variability among deaf individuals, compared to more consistent FC in the hearing group.

View Article and Find Full Text PDF

Similar Publications

Classifying Continuous Glucose Monitoring Documents From Electronic Health Records.

J Diabetes Sci Technol

March 2025

Department of Population Health, Grossman School of Medicine, New York University, New York, NY, USA.

Yaguang Zheng Eduardo Iturrate Lehan Li Bei Wu William R Small

Background: Clinical use of continuous glucose monitoring (CGM) is increasing storage of CGM-related documents in electronic health records (EHR); however, the standardization of CGM storage is lacking. We aimed to evaluate the sensitivity and specificity of CGM Ambulatory Glucose Profile (AGP) classification criteria.

Methods: We randomly chose 2244 (18.

View Article and Find Full Text PDF

Similar Publications

Empowering pediatric, adolescent, and young adult patients with cancer utilizing generative AI chatbots to reduce psychological burden and enhance treatment engagement: a pilot study.

Front Digit Health

February 2025

Science of Functional Recovery and Reconstruction, Okayama University Graduate School of Medicine, Dentistry and Pharmaceutical Sciences, Okayama, Japan.

Joe Hasei Mana Hanzawa Akihito Nagano Naoko Maeda Shinichirou Yoshida

Background: Pediatric and adolescent/young adult (AYA) cancer patients face profound psychological challenges, exacerbated by limited access to continuous mental health support. While conventional therapeutic interventions often follow structured protocols, the potential of generative artificial intelligence (AI) chatbots to provide continuous conversational support remains unexplored. This study evaluates the feasibility and impact of AI chatbots in alleviating psychological distress and enhancing treatment engagement in this vulnerable population.

View Article and Find Full Text PDF

Similar Publications

Consideration of evidence-based training content to strengthen coach recognition of concussion during youth sports activities.

Brain Inj

March 2025

Interdisciplinary Health Sciences & Sociology, Oakland University, Rochester, Minnesota, USA.

Ann Guernon Paul M Wright Beverly W Henry Kendra Jorgensen-Wagers Jennifer A Weaver

Objective: To synthesize requirements and recommendations addressing sport-related concussion (SRC).

Design: Qualitative study.

Setting: Scholastic and non-scholastic athletic programs.

View Article and Find Full Text PDF

Similar Publications

The Development of Voice Quality in Second Language Speech Acquisition: A Case Study of the Parallel Speech Corpus of Chinese Natives and Arabic Natives.

J Voice

March 2025

School of Liberal Arts, Nankai University, Tianjin, China. Electronic address:

Jia Guo Wei Huang Muhannad Alkhattabi Jiachun Liu Qibin Ran

Objectives: The primary objective of this study is to investigate whether the voice quality in second language (L2) speech production changes over time as learners progress in their L2 acquisition.

Methods: A total of 83 Arabic native speakers learning Chinese (59 males and 24 females) and 62 Chinese native speakers learning Arabic (23 males and 39 females) participated in the study. The participants had varying durations of L2 learning (DOL) experience.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!