We introduce the Visual Experience Dataset (VEDB), a compilation of more than 240 hours of egocentric video combined with gaze- and head-tracking data that offer an unprecedented view of the visual world as experienced by human observers. The dataset consists of 717 sessions, recorded by 56 observers ranging from 7 to 46 years of age. This article outlines the data collection, processing, and labeling protocols undertaken to ensure a representative sample and discusses the potential sources of error or bias within the dataset.
View Article and Find Full Text PDFMaterial depictions in artwork are useful tools for revealing image features that support material categorization. For example, artistic recipes for drawing specific materials make explicit the critical information leading to recognizable material properties (Di Cicco, Wjintjes, & Pont, 2020) and investigating the recognizability of material renderings as a function of their visual features supports conclusions about the vocabulary of material perception. Here, we examined how the recognition of materials from photographs and drawings was affected by the application of the Portilla-Simoncelli texture synthesis model.
View Article and Find Full Text PDFScene memory has known spatial biases. Boundary extension is a well-known bias whereby observers remember visual information beyond an image's boundaries. While recent studies demonstrate that boundary contraction also reliably occurs based on intrinsic image properties, the specific properties that drive the effect are unknown.
View Article and Find Full Text PDFA number of neuroimaging techniques have been employed to understand how visual information is transformed along the visual pathway. Although each technique has spatial and temporal limitations, they can each provide important insights into the visual code. While the BOLD signal of fMRI can be quite informative, the visual code is not static and this can be obscured by fMRI's poor temporal resolution.
View Article and Find Full Text PDFHuman scene categorization is characterized by its remarkable speed. While many visual and conceptual features have been linked to this ability, significant correlations exist between feature spaces, impeding our ability to determine their relative contributions to scene categorization. Here, we used a whitening transformation to decorrelate a variety of visual and conceptual features and assess the time course of their unique contributions to scene categorization.
View Article and Find Full Text PDFOur understanding of information processing by the mammalian visual system has come through a variety of techniques ranging from psychophysics and fMRI to single unit recording and EEG. Each technique provides unique insights into the processing framework of the early visual system. Here, we focus on the nature of the information that is carried by steady state visual evoked potentials (SSVEPs).
View Article and Find Full Text PDFVisual scene category representations emerge very rapidly, yet the computational transformations that enable such invariant categorizations remain elusive. Deep convolutional neural networks (CNNs) perform visual categorization at near human-level accuracy using a feedforward architecture, providing neuroscientists with the opportunity to assess one successful series of representational transformations that enable categorization in silico. The goal of the current study is to assess the extent to which sequential scene category representations built by a CNN map onto those built in the human brain as assessed by high-density, time-resolved event-related potentials (ERPs).
View Article and Find Full Text PDFInherent correlations between visual and semantic features in real-world scenes make it difficult to determine how different scene properties contribute to neural representations. Here, we assessed the contributions of multiple properties to scene representation by partitioning the variance explained in human behavioral and brain measurements by three feature models whose inter-correlations were minimized through stimulus preselection. Behavioral assessments of scene similarity reflected unique contributions from a functional feature model indicating potential actions in scenes as well as high-level visual features from a deep neural network (DNN).
View Article and Find Full Text PDFAn L-vertex, the point at which two contours coterminate, provides highly reliable evidence that a surface terminates at that vertex, thus providing the strongest constraint on the extraction of shape from images (Guzman, 1968). Such vertices are pervasive in our visual world but the importance of a statistical regularity about them has been underappreciated: The contours defining the vertex are (almost) always of the same direction of contrast with respect to the background (i.e.
View Article and Find Full Text PDFThe purpose of categorization is to identify generalizable classes of objects whose members can be treated equivalently. Within a category, however, some exemplars are more representative of that concept than others. Despite long-standing behavioral effects, little is known about how typicality influences the neural representation of real-world objects from the same category.
View Article and Find Full Text PDFReal-world scenes are complex but lawful: blenders are more likely to be found in kitchens than beaches, and elephants are not generally found inside homes. Research over the past 40years has demonstrated that contextual associations influence object recognition, change eye movement distributions, and modulate brain activity. However, the majority of these studies choose object-scene pairs from experimenters' intuitions because the statistical relationships between objects and scenes had yet to be systematically quantified.
View Article and Find Full Text PDFHow do we know that a kitchen is a kitchen by looking? Traditional models posit that scene categorization is achieved through recognizing necessary and sufficient features and objects, yet there is little consensus about what these may be. However, scene categories should reflect how we use visual information. Therefore, we test the hypothesis that scene categories reflect functions, or the possibilities for actions within a scene.
View Article and Find Full Text PDFObjects can be simultaneously categorized at multiple levels of specificity ranging from very broad ("natural object") to very distinct ("Mr. Woof"), with a mid-level of generality (basic level: "dog") often providing the most cognitively useful distinction between categories. It is unknown, however, how this hierarchical representation is achieved in the brain.
View Article and Find Full Text PDFAtten Percept Psychophys
May 2015
Although we are able to rapidly understand novel scene images, little is known about the mechanisms that support this ability. Theories of optimal coding assert that prior visual experience can be used to ease the computational burden of visual processing. A consequence of this idea is that more probable visual inputs should be facilitated relative to more unlikely stimuli.
View Article and Find Full Text PDFHuman observers categorize visual stimuli with remarkable efficiency--a result that has led to the suggestion that object and scene categorization may be automatic processes. We tested this hypothesis by presenting observers with a modified Stroop paradigm in which object or scene words were presented over images of objects or scenes. Terms were either congruent or incongruent with the images.
View Article and Find Full Text PDFCONTEXT IS CRITICAL FOR RECOGNIZING ENVIRONMENTS AND FOR SEARCHING FOR OBJECTS WITHIN THEM: contextual associations have been shown to modulate reaction time and object recognition accuracy, as well as influence the distribution of eye movements and patterns of brain activations. However, we have not yet systematically quantified the relationships between objects and their scene environments. Here I seek to fill this gap by providing descriptive statistics of object-scene relationships.
View Article and Find Full Text PDFIn 1967, Yarbus presented qualitative data from one observer showing that the patterns of eye movements were dramatically affected by an observer's task, suggesting that complex mental states could be inferred from scan paths. The strong claim of this very influential finding has never been rigorously tested. Our observers viewed photographs for 10s each.
View Article and Find Full Text PDFWhile basic visual features such as color, motion, and orientation can guide attention, it is likely that additional features guide search for objects in real-world scenes. Recent work has shown that human observers efficiently extract global scene properties such as mean depth or navigability from a brief glance at a single scene (M. R.
View Article and Find Full Text PDFBehavioral and computational studies suggest that visual scene analysis rapidly produces a rich description of both the objects and the spatial layout of surfaces in a scene. However, there is still a large gap in our understanding of how the human brain accomplishes these diverse functions of scene understanding. Here we probe the nature of real-world scene representations using multivoxel functional magnetic resonance imaging pattern analysis.
View Article and Find Full Text PDFHow does one find objects in scenes? For decades, visual search models have been built on experiments in which observers search for targets, presented among distractor items, isolated and randomly arranged on blank backgrounds. Are these models relevant to search in continuous scenes? This article argues that the mechanisms that govern artificial, laboratory search tasks do play a role in visual search in scenes. However, scene-based information is used to guide search in ways that had no place in earlier models.
View Article and Find Full Text PDFAdaptation is ubiquitous in the human visual system, allowing recalibration to the statistical regularities of its input. Previous work has shown that global scene properties such as openness and mean depth are informative dimensions of natural scene variation useful for human and machine scene categorization (Greene & Oliva, 2009b; Oliva & Torralba, 2001). A visual system that rapidly categorizes scenes using such statistical regularities should be continuously updated, and therefore is prone to adaptation along these dimensions.
View Article and Find Full Text PDFWhat information is available from a brief glance at a novel scene? Although previous efforts to answer this question have focused on scene categorization or object detection, real-world scenes contain a wealth of information whose perceptual availability has yet to be explored. We compared image exposure thresholds in several tasks involving basic-level categorization or global- property classification. All thresholds were remarkably short: Observers achieved 75%-correct performance with presentations ranging from 19 to 67 ms, reaching maximum performance at about 100 ms.
View Article and Find Full Text PDFHuman observers are able to rapidly and accurately categorize natural scenes, but the representation mediating this feat is still unknown. Here we propose a framework of rapid scene categorization that does not segment a scene into objects and instead uses a vocabulary of global, ecological properties that describe spatial and functional aspects of scene space (such as navigability or mean depth). In Experiment 1, we obtained ground truth rankings on global properties for use in Experiments 2-4.
View Article and Find Full Text PDF