Model based planners reflect on their model-free propensities.

PLoS Comput Biol

Max Planck UCL Centre for Computational Psychiatry and Ageing Research, University College London, London, United Kingdom.

Published: January 2021

Dual-reinforcement learning theory proposes behaviour is under the tutelage of a retrospective, value-caching, model-free (MF) system and a prospective-planning, model-based (MB), system. This architecture raises a question as to the degree to which, when devising a plan, a MB controller takes account of influences from its MF counterpart. We present evidence that such a sophisticated self-reflective MB planner incorporates an anticipation of the influences its own MF-proclivities exerts on the execution of its planned future actions. Using a novel bandit task, wherein subjects were periodically allowed to design their environment, we show that reward-assignments were constructed in a manner consistent with a MB system taking account of its MF propensities. Thus, in the task participants assigned higher rewards to bandits that were momentarily associated with stronger MF tendencies. Our findings have implications for a range of decision making domains that includes drug abuse, pre-commitment, and the tension between short and long-term decision horizons in economics.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7817042PMC
http://dx.doi.org/10.1371/journal.pcbi.1008552DOI Listing

Publication Analysis

Top Keywords

model based
4
based planners
4
planners reflect
4
reflect model-free
4
model-free propensities
4
propensities dual-reinforcement
4
dual-reinforcement learning
4
learning theory
4
theory proposes
4
proposes behaviour
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!