Human-like scene interpretation by a guided counterstream processing.

Proc Natl Acad Sci U S A

Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139.

Published: October 2023

In modeling vision, there has been a remarkable progress in recognizing a range of scene components, but the problem of analyzing full scenes, an ultimate goal of visual perception, is still largely open. To deal with complete scenes, recent work focused on the training of models for extracting the full graph-like structure of a scene. In contrast with scene graphs, humans' scene perception focuses on selected structures in the scene, starting with a limited interpretation and evolving sequentially in a goal-directed manner [G. L. Malcolm, I. I. A. Groen, C. I. Baker, , 843-856 (2016)]. Guidance is crucial throughout scene interpretation since the extraction of full scene representation is often infeasible. Here, we present a model that performs human-like guided scene interpretation, using an iterative bottom-up, top-down processing, in a "counterstream" structure motivated by cortical circuitry. The process proceeds by the sequential application of top-down instructions that guide the interpretation process. The results show how scene structures of interest to the viewer are extracted by an automatically selected sequence of top-down instructions. The model shows two further benefits. One is an inherent capability to deal well with the problem of combinatorial generalization-generalizing broadly to unseen scene configurations, which is limited in current network models [B. Lake, M. Baroni, (2018)]. The second is the ability to combine visual with nonvisual information at each cycle of the interpretation process, which is a key aspect for modeling human perception as well as advancing AI vision systems.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10556630PMC
http://dx.doi.org/10.1073/pnas.2211179120DOI Listing

Publication Analysis

Top Keywords

scene interpretation
12
scene
10
top-down instructions
8
interpretation process
8
interpretation
6
human-like scene
4
interpretation guided
4
guided counterstream
4
counterstream processing
4
processing modeling
4

Similar Publications

Enhancing doctor-patient communication using large language models for pathology report interpretation.

BMC Med Inform Decis Mak

January 2025

Department of Thoracic Surgery, Guizhou Provincial People's Hospital, No. 83, Zhongshan East Road, Guiyang, Guizhou, 550000, China.

Background: Large language models (LLMs) are increasingly utilized in healthcare settings. Postoperative pathology reports, which are essential for diagnosing and determining treatment strategies for surgical patients, frequently include complex data that can be challenging for patients to comprehend. This complexity can adversely affect the quality of communication between doctors and patients about their diagnosis and treatment options, potentially impacting patient outcomes such as understanding of their condition, treatment adherence, and overall satisfaction.

View Article and Find Full Text PDF

Historically, electrophysiological correlates of scene processing have been studied with experiments using static stimuli presented for discrete timescales where participants maintain a fixed eye position. Gaps remain in generalizing these findings to real-world conditions where eye movements are made to select new visual information and where the environment remains stable but changes with our position and orientation in space, driving dynamic visual stimulation. Co-recording of eye movements and electroencephalography (EEG) is an approach to leverage fixations as time-locking events in the EEG recording under free-viewing conditions to create fixation-related potentials (FRPs), providing a neural snapshot in which to study visual processing under naturalistic conditions.

View Article and Find Full Text PDF

Shaping the Space: A Role for the Hippocampus in Mental Imagery Formation.

Vision (Basel)

January 2025

Centre for the Study of Perceptual Experience, Department of Philosophy, University of Glasgow, Glasgow G12 8QQ, UK.

Mental imagery is claimed to underlie a host of abilities, such as episodic memory, working memory, and decision-making. A popular view holds that mental imagery relies on the perceptual system and that it can be said to be 'vision in reverse'. Whereas vision exploits the bottom-up neural pathways of the visual system, mental imagery exploits the top-down neural pathways.

View Article and Find Full Text PDF

Indian mythology is a treasure trove of divine tales, yet a gap in understanding still exists between foreign tourists and the rich cultural heritage of Indian deities. To address the problem, this paper presents a deep learning-driven mobile application named "MythicVision" designed to help foreign tourists better understand India's rich cultural heritage by recognizing and interpreting images of Indian mythological deities. At first, four state-of-the-art deep models have been trained and evaluated on a custom in-house dataset consists of 10,970 images of various Indian deities sourced from both natural scene and web images.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!