Reinforcement learning models generally assume that a stimulus is presented that allows a learner to unambiguously identify the state of nature, and the reward received is drawn from a distribution that depends on that state. However, in any natural environment, the stimulus is noisy. When there is state uncertainty, it is no longer immediately obvious how to perform reinforcement learning, since the observed reward cannot be unambiguously allocated to a state of the environment. This letter addresses the problem of incorporating state uncertainty in reinforcement learning models. We show that simply ignoring the uncertainty and allocating the reward to the most likely state of the environment results in incorrect value estimates. Furthermore, using only the information that is available before observing the reward also results in incorrect estimates. We therefore introduce a new technique, posterior weighted reinforcement learning, in which the estimates of state probabilities are updated according to the observed rewards (e.g., if a learner observes a reward usually associated with a particular state, this state becomes more likely). We show analytically that this modified algorithm can converge to correct reward estimates and confirm this with numerical experiments. The algorithm is shown to be a variant of the expectation-maximization algorithm, allowing rigorous convergence analyses to be carried out. A possible neural implementation of the algorithm in the cortico-basal-ganglia-thalamic network is presented, and experimental predictions of our model are discussed.

Download full-text PDF

Source
http://dx.doi.org/10.1162/neco.2010.01-09-948DOI Listing

Publication Analysis

Top Keywords

reinforcement learning
20
state uncertainty
12
state
10
posterior weighted
8
weighted reinforcement
8
uncertainty reinforcement
8
learning models
8
state environment
8
incorrect estimates
8
reward
6

Similar Publications

Opioid reward and deep brain stimulation of the lateral hypothalamic area.

Vitam Horm

January 2025

Department of Physiology, School of Medicine, Tehran University of Medical Sciences, Tehran, Iran. Electronic address:

Opioid use disorder (OUD) is considered a global health issue that affects various aspects of patients' lives and poses a considerable burden on society. Due to the high prevalence of remissions and relapses, novel therapeutic approaches are required to manage OUD. Deep brain stimulation (DBS) is one of the most promising clinical breakthroughs in translational neuroscience.

View Article and Find Full Text PDF

Polysomnography (PSG) is crucial for diagnosing sleep disorders, but manual scoring of PSG is time-consuming and subjective, leading to high variability. While machine-learning models have improved PSG scoring, their clinical use is hindered by the 'black-box' nature. In this study, we present SleepXViT, an automatic sleep staging system using Vision Transformer (ViT) that provides intuitive, consistent explanations by mimicking human 'visual scoring'.

View Article and Find Full Text PDF

This study introduces a novel ensemble learning technique namely Multi-Armed Bandit Ensemble (MAB-Ensemble), designed for lane detection in road images intended for autonomous vehicles. The foundation of the proposed MAB-Ensemble technique is inspired in terms of Multi-Armed bandit optimization to facilitate efficient model selection for lane segmentation. The benchmarking dataset namely TuSimple is used for training, validating and testing the proposed and existing lane detection techniques.

View Article and Find Full Text PDF

Dissociating social reward learning and behavior in alcohol use disorder.

Transl Psychiatry

January 2025

Division of Psychology, Department of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden.

Background: Alcohol use disorder (AUD) is associated with deficits in social cognition and behavior, but why these deficits are acquired is unknown. We hypothesized that a reduced association between actions and outcomes for others, i.e.

View Article and Find Full Text PDF

This research introduces an innovative approach to optimal control for a class of linear systems with input saturation. It leverages the synergy of Takagi-Sugeno (T-S) fuzzy models and reinforcement learning (RL) techniques. To enhance interpretability and analytical accessibility, our approach applies T-S models to approximate the value function and generate optimal control laws while incorporating prior knowledge.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!