Publications by authors named "James J Dicarlo"

Inferences made about objects via vision, such as rapid and accurate categorization, are core to primate cognition despite the algorithmic challenge posed by varying viewpoints and scenes. Until recently, the brain mechanisms that support these capabilities were deeply mysterious. However, over the past decade, this scientific mystery has been illuminated by the discovery and development of brain-inspired, image-computable, artificial neural network (ANN) systems that rival primates in these behavioral feats.

View Article and Find Full Text PDF

A key feature of cortical systems is functional organization: the arrangement of functionally distinct neurons in characteristic spatial patterns. However, the principles underlying the emergence of functional organization in the cortex are poorly understood. Here, we develop the topographic deep artificial neural network (TDANN), the first model to predict several aspects of the functional organization of multiple cortical areas in the primate visual system.

View Article and Find Full Text PDF

Vision is widely understood as an inference problem. However, two contrasting conceptions of the inference process have each been influential in research on biological vision as well as the engineering of machine vision. The first emphasizes bottom-up signal flow, describing vision as a largely feedforward, discriminative inference process that filters and transforms the visual information to remove irrelevant variation and represent behaviorally relevant information in a format suitable for downstream functions of cognition and behavioral control.

View Article and Find Full Text PDF

A core problem in visual object learning is using a finite number of images of a new object to accurately identify that object in future, novel images. One longstanding, conceptual hypothesis asserts that this core problem is solved by adult brains through two connected mechanisms: 1) the re-representation of incoming retinal images as points in a fixed, multidimensional neural space, and 2) the optimization of linear decision boundaries in that space, via simple plasticity rules applied to a single downstream layer. Though this scheme is biologically plausible, the extent to which it explains learning behavior in humans has been unclear-in part because of a historical lack of image-computable models of the putative neural space, and in part because of a lack of measurements of human learning behaviors in difficult, naturalistic settings.

View Article and Find Full Text PDF

In the target article, Bowers et al. dispute deep artificial neural network (ANN) models as the currently leading models of human vision without producing alternatives. They eschew the use of public benchmarking platforms to compare vision models with the brain and behavior, and they advocate for a fragmented, phenomenon-specific modeling approach.

View Article and Find Full Text PDF

A key feature of many cortical systems is functional organization: the arrangement of neurons with specific functional properties in characteristic spatial patterns across the cortical surface. However, the principles underlying the emergence and utility of functional organization are poorly understood. Here we develop the Topographic Deep Artificial Neural Network (TDANN), the first unified model to accurately predict the functional organization of multiple cortical areas in the primate visual system.

View Article and Find Full Text PDF

The computational role of the abundant feedback connections in the ventral visual stream is unclear, enabling humans and nonhuman primates to effortlessly recognize objects across a multitude of viewing conditions. Prior studies have augmented feedforward convolutional neural networks (CNNs) with recurrent connections to study their role in visual processing; however, often these recurrent networks are optimized directly on neural data or the comparative metrics used are undefined for standard feedforward networks that lack these connections. In this work, we develop task-optimized convolutional recurrent (ConvRNN) network models that more correctly mimic the timing and gross neuroanatomy of the ventral pathway.

View Article and Find Full Text PDF

Humans learn from visual inputs at multiple timescales, both rapidly and flexibly acquiring visual knowledge over short periods, and robustly accumulating online learning progress over longer periods. Modeling these powerful learning capabilities is an important problem for computational visual cognitive science, and models that could replicate them would be of substantial utility in real-world computer vision settings. In this work, we establish benchmarks for both real-time and life-long continual visual learning.

View Article and Find Full Text PDF

Cortical regions apparently selective to faces, places, and bodies have provided important evidence for domain-specific theories of human cognition, development, and evolution. But claims of category selectivity are not quantitatively precise and remain vulnerable to empirical refutation. Here we develop artificial neural network-based encoding models that accurately predict the response to novel images in the fusiform face area, parahippocampal place area, and extrastriate body area, outperforming descriptive models and experts.

View Article and Find Full Text PDF
Article Synopsis
  • Optogenetic techniques are mainly used in rodent brains but are not as advanced for nonhuman primates like rhesus macaques, which have more complex brain structures and behaviors.
  • A new tool called Opto-Array, consisting of implantable light-emitting diodes, has been developed to enhance optogenetic studies in these larger brains.
  • Testing showed that using the Opto-Array to silence neurons in the macaque's primary visual cortex led to noticeable visual deficits, confirming its effectiveness for behavioral applications without causing tissue heating.
View Article and Find Full Text PDF

Temporal continuity of object identity is a feature of natural visual input and is potentially exploited - in an unsupervised manner - by the ventral visual stream to build the neural representation in inferior temporal (IT) cortex. Here, we investigated whether plasticity of individual IT neurons underlies human core object recognition behavioral changes induced with unsupervised visual experience. We built a single-neuron plasticity model combined with a previously established IT population-to-recognition-behavior-linking model to predict human learning effects.

View Article and Find Full Text PDF

Deep neural networks currently provide the best quantitative models of the response patterns of neurons throughout the primate ventral visual stream. However, such networks have remained implausible as a model of the development of the ventral stream, in part because they are trained with supervised methods requiring many more labels than are accessible to infants during development. Here, we report that recent rapid progress in unsupervised learning has largely closed this gap.

View Article and Find Full Text PDF
Article Synopsis
  • Optogenetics has transformed neuroscience research in small animals, but its effectiveness in non-human primates (NHPs) has shown mixed results.
  • * A centralized database has been created to help researchers track both successful and unsuccessful optogenetic experiments in primates, with contributions from 45 laboratories worldwide.
  • * The database, available on the Open Science Framework, aims to enhance research by sharing over 1,000 injection experiments and offers insights to improve optogenetic methods in NHPs.*
View Article and Find Full Text PDF

Distributed neural population spiking patterns in macaque inferior temporal (IT) cortex that support core object recognition require additional time to develop for specific, "late-solved" images. This suggests the necessity of recurrent processing in these computations. Which brain circuits are responsible for computing and transmitting these putative recurrent signals to IT? To test whether the ventrolateral prefrontal cortex (vlPFC) is a critical recurrent node in this system, here, we pharmacologically inactivated parts of vlPFC and simultaneously measured IT activity while monkeys performed object discrimination tasks.

View Article and Find Full Text PDF

A potentially organizing goal of the brain and cognitive sciences is to accurately explain domains of human intelligence as executable, neurally mechanistic models. Years of research have led to models that capture experimental results in individual behavioral tasks and individual brain regions. We here advocate for taking the next step: integrating experimental results from many laboratories into suites of benchmarks that, when considered together, push mechanistic models toward explaining entire domains of intelligence, such as vision, language, and motor control.

View Article and Find Full Text PDF

The ability to recognize written letter strings is foundational to human reading, but the underlying neuronal mechanisms remain largely unknown. Recent behavioral research in baboons suggests that non-human primates may provide an opportunity to investigate this question. We recorded the activity of hundreds of neurons in V4 and the inferior temporal cortex (IT) while naïve macaque monkeys passively viewed images of letters, English words and non-word strings, and tested the capacity of those neuronal representations to support a battery of orthographic processing tasks.

View Article and Find Full Text PDF

Particular deep artificial neural networks (ANNs) are today's most accurate models of the primate brain's ventral visual stream. Using an ANN-driven image synthesis method, we found that luminous power patterns (i.e.

View Article and Find Full Text PDF

Non-recurrent deep convolutional neural networks (CNNs) are currently the best at modeling core object recognition, a behavior that is supported by the densely recurrent primate ventral stream, culminating in the inferior temporal (IT) cortex. If recurrence is critical to this behavior, then primates should outperform feedforward-only deep CNNs for images that require additional recurrent processing beyond the feedforward IT response. Here we first used behavioral methods to discover hundreds of these 'challenge' images.

View Article and Find Full Text PDF

Extensive research suggests that the inferior temporal (IT) population supports visual object recognition behavior. However, causal evidence for this hypothesis has been equivocal, particularly beyond the specific case of face-selective subregions of IT. Here, we directly tested this hypothesis by pharmacologically inactivating individual, millimeter-scale subregions of IT while monkeys performed several core object recognition subtasks, interleaved trial-by trial.

View Article and Find Full Text PDF

Ventral visual stream neural responses are dynamic, even for static image presentations. However, dynamical neural models of visual cortex are lacking as most progress has been made modeling static, time-averaged responses. Here, we studied population neural dynamics during face detection across three cortical processing stages.

View Article and Find Full Text PDF

Primates, including humans, can typically recognize objects in visual images at a glance despite naturally occurring identity-preserving image transformations (e.g., changes in viewpoint).

View Article and Find Full Text PDF

A major open challenge in neuroscience is the ability to measure and perturb neural activity in vivo from well defined neural sub-populations at cellular resolution anywhere in the brain. However, limitations posed by scattering and absorption prohibit non-invasive multi-photon approaches for deep (>2mm) structures, while gradient refractive index (GRIN) endoscopes are relatively thick and can cause significant damage upon insertion. Here, we present a novel micro-endoscope design to image neural activity at arbitrary depths via an ultra-thin multi-mode optical fiber (MMF) probe that has 5-10X thinner diameter than commercially available micro-endoscopes.

View Article and Find Full Text PDF

Unlabelled: While early cortical visual areas contain fine scale spatial organization of neuronal properties, such as orientation preference, the spatial organization of higher-level visual areas is less well understood. The fMRI demonstration of face-preferring regions in human ventral cortex and monkey inferior temporal cortex ("face patches") raises the question of how neural selectivity for faces is organized. Here, we targeted hundreds of spatially registered neural recordings to the largest fMRI-identified face-preferring region in monkeys, the middle face patch (MFP), and show that the MFP contains a graded enrichment of face-preferring neurons.

View Article and Find Full Text PDF