Reinforcement learning is the problem of generating optimal behavior in a sequential decision-making environment given the opportunity of interacting with it. Many algorithms for solving reinforcement-learning problems work by computing improved estimates of the optimal value function. We extend prior analyses of reinforcement-learning algorithms and present a powerful new theorem that can provide a unified analysis of such value-function-based reinforcement-learning algorithms. The usefulness of the theorem lies in how it allows the convergence of a complex asynchronous reinforcement-learning algorithm to be proved by verifying that a simpler synchronous algorithm converges. We illustrate the application of the theorem by analyzing the convergence of Q-learning, model-based reinforcement learning, Q-learning with multistate updates, Q-learning for Markov games, and risk-sensitive reinforcement learning.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1162/089976699300016070 | DOI Listing |
Psychol Res
January 2025
School of Psychology, Shenzhen University, Shenzhen, China.
Extrinsic motivation can foster effortful cognitive control. Moreover, the selective coupling of extrinsic motivation on low- versus high-control demands tasks would exert an additional impact. However, to what extent their influences are further modulated by the level of Need for Cognition (NFC) remains unclear.
View Article and Find Full Text PDFZhongguo Dang Dai Er Ke Za Zhi
January 2025
Department of Child and Adolescent Psychiatry, Shandong Daizhuang Hospital, Jining, Shandong 272051, China.
Adolescence is a critical period for the development of the reward circuit, and reward positivity (RewP) is one of the electrophysiological indicators reflecting reward processing. Many studies have shown that abnormalities in RewP is closely associated with internalizing and externalizing problems in children and adolescents. In addition, factors such as stressful life events and sleep disorders can affect reward-related brain activity and increase the risk of various psychopathological problems in this population.
View Article and Find Full Text PDFPublic health emergencies are critical to people's lives and health, economic development and social stability. Understanding how to respond correctly to public health emergencies is the focus of societal attention. This paper focuses on the tripartite entities of public health emergencies: local governments, pharmaceutical enterprises and the public.
View Article and Find Full Text PDFTransl Psychiatry
January 2025
Department of Psychology, Goldsmiths University of London, London, UK.
Bipolar disorder (BD) involves altered reward processing and decision-making, with inconsistencies across studies. Here, we integrated hierarchical Bayesian modelling with magnetoencephalography (MEG) to characterise maladaptive belief updating in this condition. First, we determined if previously reported increased learning rates in BD stem from a heightened expectation of environmental changes.
View Article and Find Full Text PDFLearn Mem
January 2025
Department of Psychology, Arizona State University, Tempe, Arizona 85287, USA
Chronic stress typically leads to deficits in fear extinction. However, when a delay occurs from the end of chronic stress and the start of fear conditioning (a "recovery"), rats show improved context-cue discrimination, compared to recently stressed rats or nonstressed rats. The infralimbic cortex (IL) is important for fear extinction and undergoes neuronal remodeling after chronic stress ends, which could drive improved context-cue discrimination.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!