Visual question answering with natural language explanation (VQA-NLE) is a challenging task that requires models to not only generate accurate answers but also to provide explanations that justify the relevant decision-making processes. This task is accomplished by generating natural language sentences based on the given question-image pair. However, existing methods often struggle to ensure consistency between the answers and explanations due to their disregard of the crucial interactions between these factors. Moreover, existing methods overlook the potential benefits of incorporating additional knowledge, which hinders their ability to effectively bridge the semantic gap between questions and images, leading to less accurate explanations. In this paper, we present a novel approach denoted the knowledge-based iterative consensus VQA-NLE (KICNLE) model to address these limitations. To maintain consistency, our model incorporates an iterative consensus generator that adopts a multi-iteration generative method, enabling multiple iterations of the answer and explanation in each generation. In each iteration, the current answer is utilized to generate an explanation, which in turn guides the generation of a new answer. Additionally, a knowledge retrieval module is introduced to provide potentially valid candidate knowledge, guide the generation process, effectively bridge the gap between questions and images, and enable the production of high-quality answer-explanation pairs. Extensive experiments conducted on three different datasets demonstrate the superiority of our proposed KICNLE model over competing state-of-the-art approaches. Our code is available at https://github.com/Gary-code/KICNLE.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TIP.2024.3379900DOI Listing

Publication Analysis

Top Keywords

natural language
12
visual question
8
question answering
8
answering natural
8
language explanation
8
existing methods
8
effectively bridge
8
gap questions
8
questions images
8
iterative consensus
8

Similar Publications

Research on brain plasticity, particularly in the context of deafness, consistently emphasizes the reorganization of the auditory cortex. But to what extent do all individuals with deafness show the same level of reorganization? To address this question, we examined the individual differences in functional connectivity (FC) from the deprived auditory cortex. Our findings demonstrate remarkable differentiation between individuals deriving from the absence of shared auditory experiences, resulting in heightened FC variability among deaf individuals, compared to more consistent FC in the hearing group.

View Article and Find Full Text PDF

Classifying Continuous Glucose Monitoring Documents From Electronic Health Records.

J Diabetes Sci Technol

March 2025

Department of Population Health, Grossman School of Medicine, New York University, New York, NY, USA.

Background: Clinical use of continuous glucose monitoring (CGM) is increasing storage of CGM-related documents in electronic health records (EHR); however, the standardization of CGM storage is lacking. We aimed to evaluate the sensitivity and specificity of CGM Ambulatory Glucose Profile (AGP) classification criteria.

Methods: We randomly chose 2244 (18.

View Article and Find Full Text PDF

Background: Pediatric and adolescent/young adult (AYA) cancer patients face profound psychological challenges, exacerbated by limited access to continuous mental health support. While conventional therapeutic interventions often follow structured protocols, the potential of generative artificial intelligence (AI) chatbots to provide continuous conversational support remains unexplored. This study evaluates the feasibility and impact of AI chatbots in alleviating psychological distress and enhancing treatment engagement in this vulnerable population.

View Article and Find Full Text PDF

Objective: To synthesize requirements and recommendations addressing sport-related concussion (SRC).

Design: Qualitative study.

Setting: Scholastic and non-scholastic athletic programs.

View Article and Find Full Text PDF

Objectives: The primary objective of this study is to investigate whether the voice quality in second language (L2) speech production changes over time as learners progress in their L2 acquisition.

Methods: A total of 83 Arabic native speakers learning Chinese (59 males and 24 females) and 62 Chinese native speakers learning Arabic (23 males and 39 females) participated in the study. The participants had varying durations of L2 learning (DOL) experience.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!