Visual question answering with natural language explanation (VQA-NLE) is a challenging task that requires models to not only generate accurate answers but also to provide explanations that justify the relevant decision-making processes. This task is accomplished by generating natural language sentences based on the given question-image pair. However, existing methods often struggle to ensure consistency between the answers and explanations due to their disregard of the crucial interactions between these factors. Moreover, existing methods overlook the potential benefits of incorporating additional knowledge, which hinders their ability to effectively bridge the semantic gap between questions and images, leading to less accurate explanations. In this paper, we present a novel approach denoted the knowledge-based iterative consensus VQA-NLE (KICNLE) model to address these limitations. To maintain consistency, our model incorporates an iterative consensus generator that adopts a multi-iteration generative method, enabling multiple iterations of the answer and explanation in each generation. In each iteration, the current answer is utilized to generate an explanation, which in turn guides the generation of a new answer. Additionally, a knowledge retrieval module is introduced to provide potentially valid candidate knowledge, guide the generation process, effectively bridge the gap between questions and images, and enable the production of high-quality answer-explanation pairs. Extensive experiments conducted on three different datasets demonstrate the superiority of our proposed KICNLE model over competing state-of-the-art approaches. Our code is available at https://github.com/Gary-code/KICNLE.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TIP.2024.3379900 | DOI Listing |
Elife
March 2025
Department of Neuroscience, Georgetown University Medical Center, Washington DC, United States.
Research on brain plasticity, particularly in the context of deafness, consistently emphasizes the reorganization of the auditory cortex. But to what extent do all individuals with deafness show the same level of reorganization? To address this question, we examined the individual differences in functional connectivity (FC) from the deprived auditory cortex. Our findings demonstrate remarkable differentiation between individuals deriving from the absence of shared auditory experiences, resulting in heightened FC variability among deaf individuals, compared to more consistent FC in the hearing group.
View Article and Find Full Text PDFJ Diabetes Sci Technol
March 2025
Department of Population Health, Grossman School of Medicine, New York University, New York, NY, USA.
Background: Clinical use of continuous glucose monitoring (CGM) is increasing storage of CGM-related documents in electronic health records (EHR); however, the standardization of CGM storage is lacking. We aimed to evaluate the sensitivity and specificity of CGM Ambulatory Glucose Profile (AGP) classification criteria.
Methods: We randomly chose 2244 (18.
Front Digit Health
February 2025
Science of Functional Recovery and Reconstruction, Okayama University Graduate School of Medicine, Dentistry and Pharmaceutical Sciences, Okayama, Japan.
Background: Pediatric and adolescent/young adult (AYA) cancer patients face profound psychological challenges, exacerbated by limited access to continuous mental health support. While conventional therapeutic interventions often follow structured protocols, the potential of generative artificial intelligence (AI) chatbots to provide continuous conversational support remains unexplored. This study evaluates the feasibility and impact of AI chatbots in alleviating psychological distress and enhancing treatment engagement in this vulnerable population.
View Article and Find Full Text PDFBrain Inj
March 2025
Interdisciplinary Health Sciences & Sociology, Oakland University, Rochester, Minnesota, USA.
Objective: To synthesize requirements and recommendations addressing sport-related concussion (SRC).
Design: Qualitative study.
Setting: Scholastic and non-scholastic athletic programs.
J Voice
March 2025
School of Liberal Arts, Nankai University, Tianjin, China. Electronic address:
Objectives: The primary objective of this study is to investigate whether the voice quality in second language (L2) speech production changes over time as learners progress in their L2 acquisition.
Methods: A total of 83 Arabic native speakers learning Chinese (59 males and 24 females) and 62 Chinese native speakers learning Arabic (23 males and 39 females) participated in the study. The participants had varying durations of L2 learning (DOL) experience.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!