A unified analysis of value-function-based reinforcement- learning algorithms.

Neural Comput

Mindmaker, Ltd., Budapest 1121, Konkoly Thege M. U. 29-33, Hungary.

Published: November 1999

Reinforcement learning is the problem of generating optimal behavior in a sequential decision-making environment given the opportunity of interacting with it. Many algorithms for solving reinforcement-learning problems work by computing improved estimates of the optimal value function. We extend prior analyses of reinforcement-learning algorithms and present a powerful new theorem that can provide a unified analysis of such value-function-based reinforcement-learning algorithms. The usefulness of the theorem lies in how it allows the convergence of a complex asynchronous reinforcement-learning algorithm to be proved by verifying that a simpler synchronous algorithm converges. We illustrate the application of the theorem by analyzing the convergence of Q-learning, model-based reinforcement learning, Q-learning with multistate updates, Q-learning for Markov games, and risk-sensitive reinforcement learning.

Download full-text PDF

Source
http://dx.doi.org/10.1162/089976699300016070DOI Listing

Publication Analysis

Top Keywords

reinforcement learning
12
unified analysis
8
analysis value-function-based
8
reinforcement-learning algorithms
8
value-function-based reinforcement-
4
learning
4
reinforcement- learning
4
algorithms
4
learning algorithms
4
algorithms reinforcement
4

Similar Publications

Extrinsic motivation can foster effortful cognitive control. Moreover, the selective coupling of extrinsic motivation on low- versus high-control demands tasks would exert an additional impact. However, to what extent their influences are further modulated by the level of Need for Cognition (NFC) remains unclear.

View Article and Find Full Text PDF

[Research advances in reward positivity and internalizing and externalizing problems in children and adolescents].

Zhongguo Dang Dai Er Ke Za Zhi

January 2025

Department of Child and Adolescent Psychiatry, Shandong Daizhuang Hospital, Jining, Shandong 272051, China.

Adolescence is a critical period for the development of the reward circuit, and reward positivity (RewP) is one of the electrophysiological indicators reflecting reward processing. Many studies have shown that abnormalities in RewP is closely associated with internalizing and externalizing problems in children and adolescents. In addition, factors such as stressful life events and sleep disorders can affect reward-related brain activity and increase the risk of various psychopathological problems in this population.

View Article and Find Full Text PDF

Public health emergencies are critical to people's lives and health, economic development and social stability. Understanding how to respond correctly to public health emergencies is the focus of societal attention. This paper focuses on the tripartite entities of public health emergencies: local governments, pharmaceutical enterprises and the public.

View Article and Find Full Text PDF

Bipolar disorder (BD) involves altered reward processing and decision-making, with inconsistencies across studies. Here, we integrated hierarchical Bayesian modelling with magnetoencephalography (MEG) to characterise maladaptive belief updating in this condition. First, we determined if previously reported increased learning rates in BD stem from a heightened expectation of environmental changes.

View Article and Find Full Text PDF

Chronic stress typically leads to deficits in fear extinction. However, when a delay occurs from the end of chronic stress and the start of fear conditioning (a "recovery"), rats show improved context-cue discrimination, compared to recently stressed rats or nonstressed rats. The infralimbic cortex (IL) is important for fear extinction and undergoes neuronal remodeling after chronic stress ends, which could drive improved context-cue discrimination.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!