Publications by authors named "Andrew Saxe"

Humans sometimes have an insight that leads to a sudden and drastic performance improvement on the task they are working on. Sudden strategy adaptations are often linked to insights, considered to be a unique aspect of human cognition tied to complex processes such as creativity or meta-cognitive reasoning. Here, we take a learning perspective and ask whether insight-like behaviour can occur in simple artificial neural networks, even when the models only learn to form input-output associations through gradual gradient descent.

View Article and Find Full Text PDF

Biological and artificial learning agents face numerous choices about how to learn, ranging from hyperparameter selection to aspects of task distributions like curricula. Understanding how to make these meta-learning choices could offer normative accounts of cognitive control functions in biological learners and improve engineered systems. Yet optimal strategies remain challenging to compute in modern deep networks due to the complexity of optimizing through the entire learning process.

View Article and Find Full Text PDF

Injury to the recurrent laryngeal nerve (RLN) can be a devastating complication of thyroid and parathyroid surgery. Intraoperative neuromonitoring (IONM) has been proposed as a method to reduce the number of RLN injuries but the data are inconsistent. We performed a meta-analysis to critically assess the data.

View Article and Find Full Text PDF

Learning in deep neural networks is known to depend critically on the knowledge embedded in the initial network weights. However, few theoretical results have precisely linked prior knowledge to learning dynamics. Here we derive exact solutions to the dynamics of learning with rich prior knowledge in deep linear networks by generalising Fukumizu's matrix Riccati solution (Fukumizu 1998 1E-03).

View Article and Find Full Text PDF

In animals and humans, curriculum learning-presenting data in a curated order-is critical to rapid learning and effective pedagogy. A long history of experiments has demonstrated the impact of curricula in a variety of animals but, despite its ubiquitous presence, a theoretical understanding of the phenomenon is still lacking. Surprisingly, in contrast to animal learning, curricula strategies are not widely used in machine learning and recent simulation studies reach the conclusion that curricula are moderately effective or even ineffective in most cases.

View Article and Find Full Text PDF

Memorization and generalization are complementary cognitive processes that jointly promote adaptive behavior. For example, animals should memorize safe routes to specific water sources and generalize from these memories to discover environmental features that predict new ones. These functions depend on systems consolidation mechanisms that construct neocortical memory traces from hippocampal precursors, but why systems consolidation only applies to a subset of hippocampal memories is unclear.

View Article and Find Full Text PDF

Mammals form mental maps of the environments by exploring their surroundings. Here, we investigate which elements of exploration are important for this process. We studied mouse escape behavior, in which mice are known to memorize subgoal locations-obstacle edges-to execute efficient escape routes to shelter.

View Article and Find Full Text PDF

Human understanding of the world can change rapidly when new information comes to light, such as when a plot twist occurs in a work of fiction. This flexible "knowledge assembly" requires few-shot reorganization of neural codes for relations among objects and events. However, existing computational theories are largely silent about how this could occur.

View Article and Find Full Text PDF

Making optimal decisions in the face of noise requires balancing short-term speed and accuracy. But a theory of optimality should account for the fact that short-term speed can influence long-term accuracy through learning. Here, we demonstrate that long-term learning is an important dynamical dimension of the speed-accuracy trade-off.

View Article and Find Full Text PDF

How do humans and other animals learn new tasks? A wave of brain recording studies has investigated how neural representations change during task learning, with a focus on how tasks can be acquired and coded in ways that minimise mutual interference. We review recent work that has explored the geometry and dimensionality of neural task representations in neocortex, and computational models that have exploited these findings to understand how the brain may partition knowledge between tasks. We discuss how ideas from machine learning, including those that combine supervised and unsupervised learning, are helping neuroscientists understand how natural tasks are learned and coded in biological brains.

View Article and Find Full Text PDF
Article Synopsis
  • Humans can learn multiple tasks sequentially with less interference, while deep neural networks struggle with this; the proposed computational model addresses this issue by mimicking how the prefrontal cortex manages task switching.
  • The model incorporates "sluggish" task units and a Hebbian training mechanism to reduce interference and create clear representations for different tasks.
  • Validation against human behavioral data shows that the model effectively simulates performance differences in task learning, highlighting the impact of training methods on understanding category boundaries.
View Article and Find Full Text PDF

How do neural populations code for multiple, potentially conflicting tasks? Here we used computational simulations involving neural networks to define "lazy" and "rich" coding solutions to this context-dependent decision-making problem, which trade off learning speed for robustness. During lazy learning the input dimensionality is expanded by random projections to the network hidden layer, whereas in rich learning hidden units acquire structured representations that privilege relevant over irrelevant features. For context-dependent decision-making, one rich solution is to project task representations onto low-dimensional and orthogonal manifolds.

View Article and Find Full Text PDF

Deep neural networks achieve stellar generalisation even when they have enough parameters to easily fit all their training data. We study this phenomenon by analysing the dynamics and the performance of over-parameterised two-layer neural networks in the teacher-student setup, where one network, the student, is trained on data generated by another network, called the teacher. We show how the dynamics of stochastic gradient descent (SGD) is captured by a set of differential equations and prove that this description is asymptotically exact in the limit of large inputs.

View Article and Find Full Text PDF

Neuroscience research is undergoing a minor revolution. Recent advances in machine learning and artificial intelligence research have opened up new ways of thinking about neural computation. Many researchers are excited by the possibility that deep neural networks may offer theories of perception, cognition and action for biological brains.

View Article and Find Full Text PDF

We perform an analysis of the average generalization dynamics of large neural networks trained using gradient descent. We study the practically-relevant "high-dimensional" regime where the number of free parameters in the network is on the order of or even larger than the number of examples in the dataset. Using random matrix theory and exact solutions in linear models, we derive the generalization error and training error dynamics of learning and analyze how they depend on the dimensionality of data and signal to noise ratio of the learning problem.

View Article and Find Full Text PDF

Systems neuroscience seeks explanations for how the brain implements a wide variety of perceptual, cognitive and motor tasks. Conversely, artificial intelligence attempts to design computational systems based on the tasks they will have to solve. In artificial neural networks, the three components specified by design are the objective functions, the learning rules and the architectures.

View Article and Find Full Text PDF

An extensive body of empirical research has revealed remarkable regularities in the acquisition, organization, deployment, and neural representation of human semantic knowledge, thereby raising a fundamental conceptual question: What are the theoretical principles governing the ability of neural networks to acquire, organize, and deploy abstract knowledge by integrating across many individual experiences? We address this question by mathematically analyzing the nonlinear dynamics of learning in deep linear networks. We find exact solutions to this learning dynamics that yield a conceptual explanation for the prevalence of many disparate phenomena in semantic cognition, including the hierarchical differentiation of concepts through rapid developmental transitions, the ubiquity of semantic illusions between such transitions, the emergence of item typicality and category coherence as factors controlling the speed of semantic processing, changing patterns of inductive projection over development, and the conservation of semantic similarity in neural representations across species. Thus, surprisingly, our simple neural model qualitatively recapitulates many diverse regularities underlying semantic development, while providing analytic insight into how the statistical structure of an environment can interact with nonlinear deep-learning dynamics to give rise to these regularities.

View Article and Find Full Text PDF

Introduction: We operationalized the taxonomy developed by Hauer and colleagues describing common clinical performance problems. Faculty raters pilot tested the resulting worksheet by observing recordings of problematic simulated clinical encounters involving third-year medical students. This approach provided a framework for structured feedback to guide learner improvement and curricular enhancement.

View Article and Find Full Text PDF

Speed-accuracy trade-offs strongly influence the rate of reward that can be earned in many decision-making tasks. Previous reports suggest that human participants often adopt suboptimal speed-accuracy trade-offs in single session, two-alternative forced-choice tasks. We investigated whether humans acquired optimal speed-accuracy trade-offs when extensively trained with multiple signal qualities.

View Article and Find Full Text PDF

Background: Two uncommon but serious complications after subclavian central venous port (SCVP) placement are pneumothorax (PNX) and malposition of the catheter. Chest x-rays (CXR) are commonly obtained after SCVP placement to identify these complications, but their use is controversial.

Study Design: We performed a retrospective review of SCVP placements to establish the incidence of PNX or catheter malposition identified exclusively by postprocedure CXR.

View Article and Find Full Text PDF

Adequate lymph node harvest among patients undergoing colectomy for cancer is critical for staging and therapy. Obesity is prevalent in the American population. We investigated whether lymph node harvest was compromised in obese patients undergoing colectomy for cancer.

View Article and Find Full Text PDF

Butyric acid and trichostatin A (TSA) are anti-cancer compounds that cause the upregulation of genes involved in differentiation and cell cycle regulation by inhibiting histone deacetylase (HDAC) activity. In this study we have synthesized and evaluated compounds that combine the bioavailability of short-chain fatty acids, like butyric acid, with the bidentate binding ability of TSA. A series of analogs were made to examine the effects of chain length, simple aromatic cap groups, and substituted hydroxamates on the compounds' ability to inhibit rat-liver HDAC using a fluorometric assay.

View Article and Find Full Text PDF

Background: Cervical spine fractures in the elderly carry a mortality as high as 26%. We reviewed our experience to define the level of injury, prevalence of neurologic deficits, treatments employed, and the correlation between patients' pre- and posthospital residences. Also, we correlated the prevalence of advanced directives with length of stay.

View Article and Find Full Text PDF