In this paper, we investigate the problem of abductive visual reasoning (AVR), which requires vision systems to infer the most plausible explanation for visual observations. Unlike previous work which performs visual reasoning on static images or synthesized scenes, we exploit long-term reasoning from instructional videos that contain a wealth of detailed information about the physical world. We conceptualize two tasks for this emerging and challenging topic. The primary task is AVR, which is based on the initial configuration and desired goal from an instructional video, and the model is expected to figure out what is the most plausible sequence of steps to achieve the goal. In order to avoid trivial solutions based on appearance information rather than reasoning, the second task called AVR++ is constructed, which requires the model to answer why the unselected options are less plausible. We introduce a new dataset called VideoABC, which consists of 46,354 unique steps derived from 11,827 instructional videos, formulated as 13,526 abductive reasoning questions with an average reasoning duration of 51 seconds. Through an adversarial hard hypothesis mining algorithm, non-trivial and high-quality problems are generated efficiently and effectively. To achieve human-level reasoning, we propose a Hierarchical Dual Reasoning Network (HDRNet) to capture the long-term dependencies among steps and observations. We establish a benchmark for abductive visual reasoning, and our method set state-of-the-arts on AVR (  ∼ 74 %) and AVR++ (  ∼ 45 %), and humans can easily achieve over 90% accuracy on these two tasks. The large performance gap reveals the limitation of current video understanding models on temporal reasoning and leaves substantial room for future research on this challenging problem. Our dataset and code are available at https://github.com/wl-zhao/VideoABC.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TIP.2022.3205207DOI Listing

Publication Analysis

Top Keywords

visual reasoning
16
abductive visual
12
reasoning
11
instructional videos
8
visual
5
videoabc real-world
4
real-world video
4
video dataset
4
abductive
4
dataset abductive
4

Similar Publications

Background: Early detection and personalized care for Alzheimer's Disease (AD) mitigate the devastating consequences for millions of people around the globe. In the current scenario, there is a lack of user-friendly AI applications for predicting and understanding the progression of AD. The application should address the critical need for a predictive analytics tool that offers timely and transparent insights by utilizing the patient data.

View Article and Find Full Text PDF

Clinical Manifestations.

Alzheimers Dement

December 2024

Glenn Biggs Institute for Alzheimer's & Neurodegenerative Diseases, University of Texas Health Science Center, San Antonio, TX, USA.

Background: Recent research has highlighted the importance of sleep on cognitive processes. However, conflicting evidence exists regarding optimal sleep duration and the impact of other co-occurring conditions, such as depression. A diagnosis of depression in mid-life may increase the risk of developing dementia.

View Article and Find Full Text PDF

The integration of large language models (LLMs) into clinical diagnostics has the potential to transform doctor-patient interactions. However, the readiness of these models for real-world clinical application remains inadequately tested. This paper introduces the Conversational Reasoning Assessment Framework for Testing in Medicine (CRAFT-MD) approach for evaluating clinical LLMs.

View Article and Find Full Text PDF

Prefrontal dimension change-related activation differs for visual search in sparse and dense displays.

Neuropsychologia

December 2024

Department of Experimental Psychology, Institute of Psychology, Otto-von-Guericke University, Magdeburg, Germany; Center for Behavioral Brain Sciences, Otto-von-Guericke University, Magdeburg, Germany. Electronic address:

Changes of the target-defining feature dimension have previously been shown to elicit anterior prefrontal activation increases. In the majority of studies, this change-related activation was observed in the left lateral frontopolar cortex. In at least one study, however, right anterior prefrontal activation was observed.

View Article and Find Full Text PDF

Background: Endophenotypes aid in studying the complex genetic basis of bipolar disorder. We aimed to compare first-degree relatives of patients with bipolar I disorder in a hospital in India with unrelated healthy controls in terms of neurocognition and affective temperament METHODS. This cross-sectional study was conducted between August and November 2012 at a tertiary hospital in India.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!