Publications by authors named "Gregory Zelinsky"

The brain routes and integrates information from many sources during behavior. A number of models explain this phenomenon within the framework of mixed selectivity theory, yet it is difficult to compare their predictions to understand how neurons and circuits integrate information. In this work, we apply time-series partial information decomposition [PID] to compare models of integration on a dataset of superior colliculus [SC] recordings collected during a multi-target visual search task.

View Article and Find Full Text PDF

Humans are extremely robust in our ability to perceive and recognize objects-we see faces in tea stains and can recognize friends on dark streets. Yet, neurocomputational models of primate object recognition have focused on the initial feed-forward pass of processing through the ventral stream and less on the top-down feedback that likely underlies robust object perception and recognition. Aligned with the generative approach, we propose that the visual system actively facilitates recognition by reconstructing the object hypothesized to be in the image.

View Article and Find Full Text PDF

The visual system uses sequences of selective glimpses to objects to support goal-directed behavior, but how is this attention control learned? Here we present an encoder-decoder model inspired by the interacting bottom-up and top-down visual pathways making up the recognition-attention system in the brain. At every iteration, a new glimpse is taken from the image and is processed through the "what" encoder, a hierarchy of feedforward, recurrent, and capsule layers, to obtain an object-centric (object-file) representation. This representation feeds to the "where" decoder, where the evolving recurrent representation provides top-down attentional modulation to plan subsequent glimpses and impact routing in the encoder.

View Article and Find Full Text PDF

The prediction of human gaze behavior is important for building human-computer interaction systems that can anticipate the user's attention. Computer vision models have been developed to predict the fixations made by people as they search for target objects. But what about when the target is not in the image? Equally important is to know how people search when they cannot find a target, and when they would stop searching.

View Article and Find Full Text PDF

The factors determining how attention is allocated during visual tasks have been studied for decades, but few studies have attempted to model the weighting of several of these factors within and across tasks to better understand their relative contributions. Here we consider the roles of saliency, center bias, target features, and object recognition uncertainty in predicting the first nine changes in fixation made during free viewing and visual search tasks in the OSIE and COCO-Search18 datasets, respectively. We focus on the latter-most and least familiar of these factors by proposing a new method of quantifying uncertainty in an image, one based on object recognition.

View Article and Find Full Text PDF

Are all real-world objects created equal? Visual search difficulty increases with the number of targets and as target-related visual working memory (VWM) load increases. Our goal was to investigate the load imposed by individual real-world objects held in VWM in the context of search. Measures of visual clutter attempt to quantify real-world set-size in the context of scenes.

View Article and Find Full Text PDF

Human visual recognition is outstandingly robust. People can recognize thousands of object classes in the blink of an eye (50-200 ms) even when the objects vary in position, scale, viewpoint, and illumination. What aspects of human category learning facilitate the extraction of invariant visual features for object recognition? Here, we explore the possibility that a contributing factor to learning such robust visual representations may be a taxonomic hierarchy communicated in part by common labels to which people are exposed as part of natural language.

View Article and Find Full Text PDF

Understanding how goals control behavior is a question ripe for interrogation by new methods from machine learning. These methods require large and labeled datasets to train models. To annotate a large-scale image dataset with observed search fixations, we collected 16,184 fixations from people searching for either microwaves or clocks in a dataset of 4,366 images (MS-COCO).

View Article and Find Full Text PDF

Human gaze behavior prediction is important for behavioral vision and for computer vision applications. Most models mainly focus on predicting free-viewing behavior using saliency maps, but do not generalize to goal-directed behavior, such as when a person searches for a visual target object. We propose the first inverse reinforcement learning (IRL) model to learn the internal reward function and policy used by humans during visual search.

View Article and Find Full Text PDF

Attention control is a basic behavioral process that has been studied for decades. The currently best models of attention control are deep networks trained on free-viewing behavior to predict bottom-up attention control - saliency. We introduce COCO-Search18, the first dataset of laboratory-quality goal-directed behavior large enough to train deep-network models.

View Article and Find Full Text PDF

Visual search is the task of finding things with uncertain locations. Despite decades of research, the features that guide visual search remain poorly specified, especially in realistic contexts. This study tested the role of two features-shape and orientation-both in the presence and absence of hue information.

View Article and Find Full Text PDF

Attention controls the selective routing of visual inputs for classification. This "spotlight" of attention has been assumed to be a Gaussian, but here we propose that this routing occurs in the form of a shape. We show that a model of attention control that spatially averages saliency values over proto-objects (POs), fragments of feature-similar visual space, is better able to predict the fixation density maps and scanpaths made during the free viewing of 384 natural scenes by 12 participants than comparable saliency models that do not consider shape.

View Article and Find Full Text PDF

Objects often appear with some amount of occlusion. We fill in missing information using local shape features even before attending to those objects-a process called amodal completion. Here we explore the possibility that knowledge about common realistic objects can be used to "restore" missing information even in cases where amodal completion is not expected.

View Article and Find Full Text PDF

Saccades quite systematically undershoot a peripheral visual target by about 10% of its eccentricity while becoming more variable, mainly in amplitude, as the target becomes more peripheral. This undershoot phenomenon has been interpreted as the strategic adjustment of saccadic gain downstream of the superior colliculus (SC), where saccades are programmed. Here, we investigated whether the eccentricity-related increase in saccades' hypometria and imprecision might not instead result from overrepresentation of space closer to the fovea in the SC and visual-cortical areas.

View Article and Find Full Text PDF

We investigated how expected search difficultly affects the attentional template by having participants search for a teddy bear target among either other teddy bears (difficult search, high target-distractor similarity) or random nonbear objects (easy search, low target-distractor similarity). Target previews were identical in these 2 blocked conditions, and target-related visual working memory (VWM) load was measured using contralateral delay activity (CDA), an event-related potential indicating VWM load. CDA was assessed after target designation but before search display onset.

View Article and Find Full Text PDF

Modern computational models of attention predict fixations using saliency maps and target maps, which prioritize locations for fixation based on feature contrast and target goals, respectively. But whereas many such models are biologically plausible, none have looked to the oculomotor system for design constraints or parameter specification. Conversely, although most models of saccade programming are tightly coupled to underlying neurophysiology, none have been tested using real-world stimuli and tasks.

View Article and Find Full Text PDF

This article introduces a generative model of category representation that uses computer vision methods to extract category-consistent features (CCFs) directly from images of category exemplars. The model was trained on 4,800 images of common objects, and CCFs were obtained for 68 categories spanning subordinate, basic, and superordinate levels in a category hierarchy. When participants searched for these same categories, targets cued at the subordinate level were preferentially fixated, but fixated targets were verified faster when they followed a basic-level cue.

View Article and Find Full Text PDF

Two experiments evaluated the effect of retinal image size on the proto-object model of visual clutter perception. Experiment 1 had 20 participants order 90 small images of random-category real-world scenes from least to most cluttered. Aggregating these individual rankings into a single median clutter ranking and comparing it to a previously reported clutter ranking of larger versions of the identical scenes yielded a Spearman's ρ=.

View Article and Find Full Text PDF

Priority maps are winner-take-all neural mechanisms thought to guide the allocation of covert and overt attention. Here, we go beyond this standard definition and argue that priority maps play a much broader role in controlling goal-directed behavior. We start by defining what priority maps are and where they might be found in the brain; we then ask why they exist-the function that they serve.

View Article and Find Full Text PDF

The role of target typicality in a categorical visual search task was investigated by cueing observers with a target name, followed by a five-item target present/absent search array in which the target images were rated in a pretest to be high, medium, or low in typicality with respect to the basic-level target cue. Contrary to previous work, we found that search guidance was better for high-typicality targets compared to low-typicality targets, as measured by both the proportion of immediate target fixations and the time to fixate the target. Consistent with previous work, we also found an effect of typicality on target verification times, the time between target fixation and the search judgment; as target typicality decreased, verification times increased.

View Article and Find Full Text PDF

We introduce the proto-object model of visual clutter perception. This unsupervised model segments an image into superpixels, then merges neighboring superpixels that share a common color cluster to obtain proto-objects-defined here as spatially extended regions of coherent features. Clutter is estimated by simply counting the number of proto-objects.

View Article and Find Full Text PDF

Peripheral vision outside the focus of attention may rely on summary statistics. We used a gaze-contingent paradigm to directly test this assumption by asking whether search performance differed between targets and statistically-matched visualizations of the same targets. Four-object search displays included one statistically-matched object that was replaced by an unaltered version of the object during the first eye movement.

View Article and Find Full Text PDF

The visual-search literature has assumed that the top-down target representation used to guide search resides in visual working memory (VWM). We directly tested this assumption using contralateral delay activity (CDA) to estimate the VWM load imposed by the target representation. In Experiment 1, observers previewed four photorealistic objects and were cued to remember the two objects appearing to the left or right of central fixation; Experiment 2 was identical except that observers previewed two photorealistic objects and were cued to remember one.

View Article and Find Full Text PDF

We posit that a person's gaze behavior while freely viewing a scene contains an abundance of information, not only about their intent and what they consider to be important in the scene, but also about the scene's content. Experiments are reported, using two popular image datasets from computer vision, that explore the relationship between the fixations that people make during scene viewing, how they describe the scene, and automatic detection predictions of object categories in the scene. From these exploratory analyses, we then combine human behavior with the outputs of current visual recognition methods to build prototype human-in-the-loop applications for gaze-enabled object detection and scene annotation.

View Article and Find Full Text PDF