A unified analysis of value-function-based reinforcement- learning algorithms.

Neural Comput

Mindmaker, Ltd., Budapest 1121, Konkoly Thege M. U. 29-33, Hungary.

Published: November 1999

Reinforcement learning is the problem of generating optimal behavior in a sequential decision-making environment given the opportunity of interacting with it. Many algorithms for solving reinforcement-learning problems work by computing improved estimates of the optimal value function. We extend prior analyses of reinforcement-learning algorithms and present a powerful new theorem that can provide a unified analysis of such value-function-based reinforcement-learning algorithms. The usefulness of the theorem lies in how it allows the convergence of a complex asynchronous reinforcement-learning algorithm to be proved by verifying that a simpler synchronous algorithm converges. We illustrate the application of the theorem by analyzing the convergence of Q-learning, model-based reinforcement learning, Q-learning with multistate updates, Q-learning for Markov games, and risk-sensitive reinforcement learning.

Download full-text PDF	Source
http://dx.doi.org/10.1162/089976699300016070	DOI Listing

Publication Analysis

Top Keywords

reinforcement learning

unified analysis

analysis value-function-based

reinforcement-learning algorithms

value-function-based reinforcement-

learning

reinforcement- learning

algorithms

learning algorithms

algorithms reinforcement

Similar Publications

The benefit of extrinsic motivation on effortful cognitive control is influenced by need for cognition.

Psychol Res

January 2025

School of Psychology, Shenzhen University, Shenzhen, China.

Qian Yang Ruoke Xu Lijie Zhang Lei Qiao

Extrinsic motivation can foster effortful cognitive control. Moreover, the selective coupling of extrinsic motivation on low- versus high-control demands tasks would exert an additional impact. However, to what extent their influences are further modulated by the level of Need for Cognition (NFC) remains unclear.

View Article and Find Full Text PDF

Similar Publications

[Research advances in reward positivity and internalizing and externalizing problems in children and adolescents].

Zhongguo Dang Dai Er Ke Za Zhi

January 2025

Department of Child and Adolescent Psychiatry, Shandong Daizhuang Hospital, Jining, Shandong 272051, China.

Ke-Ke Yao Huan Wang Zhen-Zhen Yang

Adolescence is a critical period for the development of the reward circuit, and reward positivity (RewP) is one of the electrophysiological indicators reflecting reward processing. Many studies have shown that abnormalities in RewP is closely associated with internalizing and externalizing problems in children and adolescents. In addition, factors such as stressful life events and sleep disorders can affect reward-related brain activity and increase the risk of various psychopathological problems in this population.

View Article and Find Full Text PDF

Similar Publications

Evolutionary game and simulation analysis of tripartite subjects in public health emergencies under government reward and punishment mechanisms.

Sci Rep

January 2025

Party School of Liaoning Provincial Party Committee, Shenyang, 110004, China.

Dandan Gao Wei Guo

Public health emergencies are critical to people's lives and health, economic development and social stability. Understanding how to respond correctly to public health emergencies is the focus of societal attention. This paper focuses on the tripartite entities of public health emergencies: local governments, pharmaceutical enterprises and the public.

View Article and Find Full Text PDF

Similar Publications

Frequency-specific changes in prefrontal activity associated with maladaptive belief updating in volatile environments in euthymic bipolar disorder.

Transl Psychiatry

January 2025

Department of Psychology, Goldsmiths University of London, London, UK.

Marina Ivanova Ksenia Germanova Dmitry S Petelin Aynur Ragimova Grigory Kopytin

Bipolar disorder (BD) involves altered reward processing and decision-making, with inconsistencies across studies. Here, we integrated hierarchical Bayesian modelling with magnetoencephalography (MEG) to characterise maladaptive belief updating in this condition. First, we determined if previously reported increased learning rates in BD stem from a heightened expectation of environmental changes.

View Article and Find Full Text PDF

Similar Publications

Inhibition of prefrontal glutamatergic neuron activity during the recovery period following chronic stress disrupts fear memory in male rats: potential role of the infralimbic cortex.

Learn Mem

January 2025

Department of Psychology, Arizona State University, Tempe, Arizona 85287, USA

Jessica M Judd Dylan N Peay Jinah L Kim Elliot A Smith Megan E Donnay

Chronic stress typically leads to deficits in fear extinction. However, when a delay occurs from the end of chronic stress and the start of fear conditioning (a "recovery"), rats show improved context-cue discrimination, compared to recently stressed rats or nonstressed rats. The infralimbic cortex (IL) is important for fear extinction and undergoes neuronal remodeling after chronic stress ends, which could drive improved context-cue discrimination.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!