Existing visual question answering methods often suffer from cross-modal spurious correlations and oversimplified event-level reasoning processes that fail to capture event temporality, causality, and dynamics spanning over the video. In this work, to address the task of event-level visual question answering, we propose a framework for cross-modal causal relational reasoning. In particular, a set of causal intervention operations is introduced to discover the underlying causal structures across visual and linguistic modalities. Our framework, named Cross-Modal Causal RelatIonal Reasoning (CMCIR), involves three modules: i) Causality-aware Visual-Linguistic Reasoning (CVLR) module for collaboratively disentangling the visual and linguistic spurious correlations via front-door and back-door causal interventions; ii) Spatial-Temporal Transformer (STT) module for capturing the fine-grained interactions between visual and linguistic semantics; iii) Visual-Linguistic Feature Fusion (VLFF) module for learning the global semantic-aware visual-linguistic representations adaptively. Extensive experiments on four event-level datasets demonstrate the superiority of our CMCIR in discovering visual-linguistic causal structures and achieving robust event-level visual question answering.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TPAMI.2023.3284038DOI Listing

Publication Analysis

Top Keywords

visual question
16
question answering
16
cross-modal causal
12
causal relational
12
relational reasoning
12
event-level visual
12
visual linguistic
12
spurious correlations
8
causal structures
8
visual
7

Similar Publications

Development and pilot testing of INTERVENER, a web-based tool to match barriers to the cancer continuum organization to evidence-based interventions.

BMC Health Serv Res

January 2025

Early Detection, Prevention & Infections Branch, International Agency for Research on Cancer, 25 Avenue Tony Garnier, Lyon, 69366 Cedex 07, France.

Background: Barriers to the cancer continuum organization and interventions to approach them have been identified; however, there is a lack of a tool matching them. Our aim was to develop a web-based tool to identify the main barriers to the process of the cancer continuum organization, and propose matched evidence-based interventions (EBI) to overcome them.

Methods: A questionnaire on barriers at six steps of the process of the cancer continuum organization was answered by collaborators.

View Article and Find Full Text PDF

For some people the experience of visual imagery is lacking, a condition recently referred to as aphantasia. So far, most of the studies on aphantasia rely on subjective reports, leaving the question of whether mental images can exist without reaching consciousness unresolved. In the present study, the formation of mental images was estimated in individuals with aphantasia without explicitly asking them to generate mental images.

View Article and Find Full Text PDF

Level-2 visuo-spatial perspective-taking (VPT) helps us to understand how the world appears for another person. The process has been linked to conceptual forms of perspective-taking, such as empathic perspective-taking. The present study tested whether similarity to the target of the process, as indicated by gender (in)congruency, affects its embodiment and conclusively answers the question whether there are gender differences in VPT performance.

View Article and Find Full Text PDF

Objectives: Within paramedic education immersive simulation is widely used to teach technical skills, but its application to non-technical aspects of practice, such as research skills, is limited. This study aimed to explore immersive simulation as a tool to teach specific research skills to paramedic students in higher education to investigate its novel capacity beyond the more traditionally considered technical elements of practice.

Methods: A didactic pre-briefing was delivered to undergraduate paramedic students before they undertook an immersive simulation in which they were expected to assess, extricate, and treat a stroke patient, whilst also assessing whether he was suitable to be enrolled onto a clinical trial, provide information on this, and take consent.

View Article and Find Full Text PDF

Background: Alzheimer's disease (AD) is a form of dementia that impairs memory, language, and daily functioning. With disease progression, AD patients reportedly experience disturbances in their awareness of self, others, and their environment. These disturbances are associated with unfavourable clinical outcomes, which prompts critical questions about how AD patients experience the world around them.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!