Offline replay supports planning in human reinforcement learning.

Elife

Princeton Neuroscience Institute, Princeton University, New Jersey, United States.

Published: December 2018

Making decisions in sequentially structured tasks requires integrating distally acquired information. The extensive computational cost of such integration challenges planning methods that integrate online, at decision time. Furthermore, it remains unclear whether 'offline' integration during replay supports planning, and if so which memories should be replayed. Inspired by machine learning, we propose that (a) offline replay of trajectories facilitates integrating representations that guide decisions, and (b) unsigned prediction errors (uncertainty) trigger such integrative replay. We designed a 2-step revaluation task for fMRI, whereby participants needed to integrate changes in rewards with past knowledge to optimally replan decisions. As predicted, we found that (a) multi-voxel pattern evidence for off-task replay predicts subsequent replanning; (b) neural sensitivity to uncertainty predicts subsequent replay and replanning; (c) off-task hippocampus and anterior cingulate activity increase when revaluation is required. These findings elucidate how the brain leverages offline mechanisms in planning and goal-directed behavior under uncertainty.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6303108PMC
http://dx.doi.org/10.7554/eLife.32548DOI Listing

Publication Analysis

Top Keywords

offline replay
8
replay supports
8
supports planning
8
predicts subsequent
8
replay
5
planning
4
planning human
4
human reinforcement
4
reinforcement learning
4
learning making
4

Similar Publications

Replay as a Basis for Backpropagation Through Time in the Brain.

Neural Comput

January 2025

Department of Psychological and Brain Sciences, Indiana University Bloomington, Bloomington, IN 47405, U.S.A.

How episodic memories are formed in the brain is a continuing puzzle for the neuroscience community. The brain areas that are critical for episodic learning (e.g.

View Article and Find Full Text PDF

No sooner is an experience over than its neural representation begins to be transformed through memory reactivation during offline periods. The lion's share of prior research has focused on understanding offline reactivation within the hippocampus. However, it is hypothesized that consolidation processes involve offline reactivation in cortical regions as well as coordinated reactivation in the hippocampus and cortex.

View Article and Find Full Text PDF

A Higher Performance Data Backup Scheme Based on Multi-Factor Authentication.

Entropy (Basel)

August 2024

School of Computer Science and Technology, Donghua University, Shanghai 201620, China.

Remote data backup technology avoids the risk of data loss and tampering, and has higher security compared to local data backup solutions. However, the data transmission channel for remote data backup is not secure, and the backup server cannot be fully trusted, so users usually encrypt the data before uploading it to the remote server. As a result, how to protect this encryption key is crucial.

View Article and Find Full Text PDF

In this article, a novel model-free policy gradient reinforcement learning algorithm is proposed to solve the H tracking problem for discrete-time heterogeneous multiagent systems with external disturbances over switching topology. The dynamics of the followers and the leader are unknown, and the leader's information is missing for each agent due to the switching topology. Therefore, a distributed adaptive observer is introduced to learn the leader's dynamic model and estimate its state for each agent.

View Article and Find Full Text PDF

Actor-Critic Alignment for Offline-to-Online Reinforcement Learning.

Proc Mach Learn Res

July 2023

Department of Computer Science, University of Illinois Chicago, Chicago, IL 60607, USA.

Deep offline reinforcement learning has recently demonstrated considerable promises in leveraging offline datasets, providing high-quality models that significantly reduce the online interactions required for fine-tuning. However, such a benefit is often diminished due to the marked state-action distribution shift, which causes significant bootstrap error and wipes out the good initial policy Existing solutions resort to constraining the policy shift or balancing the sample replay based on their online-ness. However, they require online estimation of distribution divergence or density ratio.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!