In modeling vision, there has been a remarkable progress in recognizing a range of scene components, but the problem of analyzing full scenes, an ultimate goal of visual perception, is still largely open. To deal with complete scenes, recent work focused on the training of models for extracting the full graph-like structure of a scene. In contrast with scene graphs, humans' scene perception focuses on selected structures in the scene, starting with a limited interpretation and evolving sequentially in a goal-directed manner [G. L. Malcolm, I. I. A. Groen, C. I. Baker, , 843-856 (2016)]. Guidance is crucial throughout scene interpretation since the extraction of full scene representation is often infeasible. Here, we present a model that performs human-like guided scene interpretation, using an iterative bottom-up, top-down processing, in a "counterstream" structure motivated by cortical circuitry. The process proceeds by the sequential application of top-down instructions that guide the interpretation process. The results show how scene structures of interest to the viewer are extracted by an automatically selected sequence of top-down instructions. The model shows two further benefits. One is an inherent capability to deal well with the problem of combinatorial generalization-generalizing broadly to unseen scene configurations, which is limited in current network models [B. Lake, M. Baroni, (2018)]. The second is the ability to combine visual with nonvisual information at each cycle of the interpretation process, which is a key aspect for modeling human perception as well as advancing AI vision systems.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10556630 | PMC |
http://dx.doi.org/10.1073/pnas.2211179120 | DOI Listing |
BMC Med Inform Decis Mak
January 2025
Department of Thoracic Surgery, Guizhou Provincial People's Hospital, No. 83, Zhongshan East Road, Guiyang, Guizhou, 550000, China.
Background: Large language models (LLMs) are increasingly utilized in healthcare settings. Postoperative pathology reports, which are essential for diagnosing and determining treatment strategies for surgical patients, frequently include complex data that can be challenging for patients to comprehend. This complexity can adversely affect the quality of communication between doctors and patients about their diagnosis and treatment options, potentially impacting patient outcomes such as understanding of their condition, treatment adherence, and overall satisfaction.
View Article and Find Full Text PDFAtten Percept Psychophys
January 2025
U.S. DEVCOM Army Research Laboratory, Humans in Complex Systems, Aberdeen Proving Ground, MD, USA.
Historically, electrophysiological correlates of scene processing have been studied with experiments using static stimuli presented for discrete timescales where participants maintain a fixed eye position. Gaps remain in generalizing these findings to real-world conditions where eye movements are made to select new visual information and where the environment remains stable but changes with our position and orientation in space, driving dynamic visual stimulation. Co-recording of eye movements and electroencephalography (EEG) is an approach to leverage fixations as time-locking events in the EEG recording under free-viewing conditions to create fixation-related potentials (FRPs), providing a neural snapshot in which to study visual processing under naturalistic conditions.
View Article and Find Full Text PDFVision (Basel)
January 2025
Centre for the Study of Perceptual Experience, Department of Philosophy, University of Glasgow, Glasgow G12 8QQ, UK.
Mental imagery is claimed to underlie a host of abilities, such as episodic memory, working memory, and decision-making. A popular view holds that mental imagery relies on the perceptual system and that it can be said to be 'vision in reverse'. Whereas vision exploits the bottom-up neural pathways of the visual system, mental imagery exploits the top-down neural pathways.
View Article and Find Full Text PDFSci Rep
January 2025
School of Computer Science and Engineering (SCOPE), VIT-AP University, Amaravati, Andhra Pradesh, 522237, India.
Indian mythology is a treasure trove of divine tales, yet a gap in understanding still exists between foreign tourists and the rich cultural heritage of Indian deities. To address the problem, this paper presents a deep learning-driven mobile application named "MythicVision" designed to help foreign tourists better understand India's rich cultural heritage by recognizing and interpreting images of Indian mythological deities. At first, four state-of-the-art deep models have been trained and evaluated on a custom in-house dataset consists of 10,970 images of various Indian deities sourced from both natural scene and web images.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!