Human-like scene interpretation by a guided counterstream processing.

Shimon Ullman Liav Assif Alona Strugatski Ben-Zion Vatashsky Hila Levi Aviv Netanyahu Adam Yaari

Proc Natl Acad Sci U S A

Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139.

Published: October 2023

In modeling vision, there has been a remarkable progress in recognizing a range of scene components, but the problem of analyzing full scenes, an ultimate goal of visual perception, is still largely open. To deal with complete scenes, recent work focused on the training of models for extracting the full graph-like structure of a scene. In contrast with scene graphs, humans' scene perception focuses on selected structures in the scene, starting with a limited interpretation and evolving sequentially in a goal-directed manner [G. L. Malcolm, I. I. A. Groen, C. I. Baker, , 843-856 (2016)]. Guidance is crucial throughout scene interpretation since the extraction of full scene representation is often infeasible. Here, we present a model that performs human-like guided scene interpretation, using an iterative bottom-up, top-down processing, in a "counterstream" structure motivated by cortical circuitry. The process proceeds by the sequential application of top-down instructions that guide the interpretation process. The results show how scene structures of interest to the viewer are extracted by an automatically selected sequence of top-down instructions. The model shows two further benefits. One is an inherent capability to deal well with the problem of combinatorial generalization-generalizing broadly to unseen scene configurations, which is limited in current network models [B. Lake, M. Baroni, (2018)]. The second is the ability to combine visual with nonvisual information at each cycle of the interpretation process, which is a key aspect for modeling human perception as well as advancing AI vision systems.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10556630	PMC
http://dx.doi.org/10.1073/pnas.2211179120	DOI Listing

Publication Analysis

Top Keywords

scene interpretation

scene

top-down instructions

interpretation process

interpretation

human-like scene

interpretation guided

guided counterstream

counterstream processing

processing modeling

Similar Publications

Enhancing doctor-patient communication using large language models for pathology report interpretation.

BMC Med Inform Decis Mak

January 2025

Department of Thoracic Surgery, Guizhou Provincial People's Hospital, No. 83, Zhongshan East Road, Guiyang, Guizhou, 550000, China.

Xiongwen Yang Yi Xiao Di Liu Yun Zhang Huiyin Deng

Background: Large language models (LLMs) are increasingly utilized in healthcare settings. Postoperative pathology reports, which are essential for diagnosing and determining treatment strategies for surgical patients, frequently include complex data that can be challenging for patients to comprehend. This complexity can adversely affect the quality of communication between doctors and patients about their diagnosis and treatment options, potentially impacting patient outcomes such as understanding of their condition, treatment adherence, and overall satisfaction.

View Article and Find Full Text PDF

Similar Publications

Fixation-related potentials during a virtual navigation task: The influence of image statistics on early cortical processing.

Atten Percept Psychophys

January 2025

U.S. DEVCOM Army Research Laboratory, Humans in Complex Systems, Aberdeen Proving Ground, MD, USA.

Anna Madison Chloe Callahan-Flintoft Steven M Thurman Russell A Cohen Hoffing Jonathan Touryan

Historically, electrophysiological correlates of scene processing have been studied with experiments using static stimuli presented for discrete timescales where participants maintain a fixed eye position. Gaps remain in generalizing these findings to real-world conditions where eye movements are made to select new visual information and where the environment remains stable but changes with our position and orientation in space, driving dynamic visual stimulation. Co-recording of eye movements and electroencephalography (EEG) is an approach to leverage fixations as time-locking events in the EEG recording under free-viewing conditions to create fixation-related potentials (FRPs), providing a neural snapshot in which to study visual processing under naturalistic conditions.

View Article and Find Full Text PDF

Similar Publications

Shaping the Space: A Role for the Hippocampus in Mental Imagery Formation.

Vision (Basel)

January 2025

Centre for the Study of Perceptual Experience, Department of Philosophy, University of Glasgow, Glasgow G12 8QQ, UK.

Andrea Blomkvist

Mental imagery is claimed to underlie a host of abilities, such as episodic memory, working memory, and decision-making. A popular view holds that mental imagery relies on the perceptual system and that it can be said to be 'vision in reverse'. Whereas vision exploits the bottom-up neural pathways of the visual system, mental imagery exploits the top-down neural pathways.

View Article and Find Full Text PDF

Similar Publications

MythicVision: a deep learning powered mobile application for understanding Indian mythological deities using weight centric decision approach.

Sci Rep

January 2025

School of Computer Science and Engineering (SCOPE), VIT-AP University, Amaravati, Andhra Pradesh, 522237, India.

Tauseef Khan Aditya Nitin Patil Aviral Singh Gitesh Prashant Bhavsar Kanakagiri Sujay Ashrith

Indian mythology is a treasure trove of divine tales, yet a gap in understanding still exists between foreign tourists and the rich cultural heritage of Indian deities. To address the problem, this paper presents a deep learning-driven mobile application named "MythicVision" designed to help foreign tourists better understand India's rich cultural heritage by recognizing and interpreting images of Indian mythological deities. At first, four state-of-the-art deep models have been trained and evaluated on a custom in-house dataset consists of 10,970 images of various Indian deities sourced from both natural scene and web images.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!