How we learn to make decisions: rapid propagation of reinforcement learning prediction errors in humans.

Olav E Krigolson Cameron D Hassall Todd C Handy

J Cogn Neurosci

Published: March 2014

Our ability to make decisions is predicated upon our knowledge of the outcomes of the actions available to us. Reinforcement learning theory posits that actions followed by a reward or punishment acquire value through the computation of prediction errors-discrepancies between the predicted and the actual reward. A multitude of neuroimaging studies have demonstrated that rewards and punishments evoke neural responses that appear to reflect reinforcement learning prediction errors [e.g., Krigolson, O. E., Pierce, L. J., Holroyd, C. B., & Tanaka, J. W. Learning to become an expert: Reinforcement learning and the acquisition of perceptual expertise. Journal of Cognitive Neuroscience, 21, 1833-1840, 2009; Bayer, H. M., & Glimcher, P. W. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47, 129-141, 2005; O'Doherty, J. P. Reward representations and reward-related learning in the human brain: Insights from neuroimaging. Current Opinion in Neurobiology, 14, 769-776, 2004; Holroyd, C. B., & Coles, M. G. H. The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity. Psychological Review, 109, 679-709, 2002]. Here, we used the brain ERP technique to demonstrate that not only do rewards elicit a neural response akin to a prediction error but also that this signal rapidly diminished and propagated to the time of choice presentation with learning. Specifically, in a simple, learnable gambling task, we show that novel rewards elicited a feedback error-related negativity that rapidly decreased in amplitude with learning. Furthermore, we demonstrate the existence of a reward positivity at choice presentation, a previously unreported ERP component that has a similar timing and topography as the feedback error-related negativity that increased in amplitude with learning. The pattern of results we observed mirrored the output of a computational model that we implemented to compute reward prediction errors and the changes in amplitude of these prediction errors at the time of choice presentation and reward delivery. Our results provide further support that the computations that underlie human learning and decision-making follow reinforcement learning principles.

Download full-text PDF	Source
http://dx.doi.org/10.1162/jocn_a_00509	DOI Listing

Publication Analysis

Top Keywords

reinforcement learning

prediction errors

learning

error-related negativity

choice presentation

learning prediction

reward prediction

prediction error

error signal

time choice

Similar Publications

Learning and fine-tuning a generic value-selection heuristic inside a constraint programming solver.

Constraints

November 2024

Polytechnique Montréal, Montreal, Canada.

Tom Marty Léo Boisvert Tristan François Pierre Tessier Louis Gautier

Constraint programming is known for being an efficient approach to solving combinatorial problems. Important design choices in a solver are the , designed to lead the search to the best solutions in a minimum amount of time. However, developing these heuristics is a time-consuming process that requires problem-specific expertise.

View Article and Find Full Text PDF

Similar Publications

RL-QPSO net: deep reinforcement learning-enhanced QPSO for efficient mobile robot path planning.

Front Neurorobot

January 2025

Hebi Institute of Engineering and Technology, Henan Polytechnic University, Hebi, Henan, China.

Yang Jing Li Weiya

Introduction: Path planning in complex and dynamic environments poses a significant challenge in the field of mobile robotics. Traditional path planning methods such as genetic algorithms, Dijkstra's algorithm, and Floyd's algorithm typically rely on deterministic search strategies, which can lead to local optima and lack global search capabilities in dynamic settings. These methods have high computational costs and are not efficient for real-time applications.

View Article and Find Full Text PDF

Similar Publications

Memory consolidation from a reinforcement learning perspective.

Front Comput Neurosci

January 2025

Center for Synaptic Brain Dysfunctions, Institute for Basic Science, Daejeon, Republic of Korea.

Jong Won Lee Min Whan Jung

Memory consolidation refers to the process of converting temporary memories into long-lasting ones. It is widely accepted that new experiences are initially stored in the hippocampus as rapid associative memories, which then undergo a consolidation process to establish more permanent traces in other regions of the brain. Over the past two decades, studies in humans and animals have demonstrated that the hippocampus is crucial not only for memory but also for imagination and future planning, with the CA3 region playing a pivotal role in generating novel activity patterns.

View Article and Find Full Text PDF

Similar Publications

Unmasking the Dark Triad: A Data Fusion Machine Learning Approach to Characterize the Neural Bases of Narcissistic, Machiavellian and Psychopathic Traits.

Eur J Neurosci

January 2025

Department of Psychology and Cognitive Sciences (DiPSCo), University of Trento, Trento, Italy.

Richard Bakiaj Clara Isabel Pantoja Muñoz Andrea Bizzego Alessandro Grecucci

The Dark Triad (DT), encompassing narcissism, Machiavellianism and psychopathy traits, poses significant societal challenges. Understanding the neural underpinnings of these traits is crucial for developing effective interventions and preventive strategies. Our study aimed to unveil the neural substrates of the DT by examining brain scans from 201 individuals (mean age: 32.

View Article and Find Full Text PDF

Similar Publications

Delta opioid receptors affect acoustic features of song during vocal learning in zebra finches.

BMC Neurosci

January 2025

National Brain Research Centre, Manesar, Gurugram, 122052, Haryana, India.

Utkarsha A Singh Soumya Iyengar

Delta-opioid receptors (δ-ORs) are known to be involved in associative learning and modulating motivational states. We wanted to study if they were also involved in naturally-occurring reinforcement learning behaviors such as vocal learning, using the zebra finch model system. Zebra finches learn to vocalize early in development and song learning in males is affected by factors such as the social environment and internal reward, both of which are modulated by endogenous opioids.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!