Reinforcement learning models generally assume that a stimulus is presented that allows a learner to unambiguously identify the state of nature, and the reward received is drawn from a distribution that depends on that state. However, in any natural environment, the stimulus is noisy. When there is state uncertainty, it is no longer immediately obvious how to perform reinforcement learning, since the observed reward cannot be unambiguously allocated to a state of the environment. This letter addresses the problem of incorporating state uncertainty in reinforcement learning models. We show that simply ignoring the uncertainty and allocating the reward to the most likely state of the environment results in incorrect value estimates. Furthermore, using only the information that is available before observing the reward also results in incorrect estimates. We therefore introduce a new technique, posterior weighted reinforcement learning, in which the estimates of state probabilities are updated according to the observed rewards (e.g., if a learner observes a reward usually associated with a particular state, this state becomes more likely). We show analytically that this modified algorithm can converge to correct reward estimates and confirm this with numerical experiments. The algorithm is shown to be a variant of the expectation-maximization algorithm, allowing rigorous convergence analyses to be carried out. A possible neural implementation of the algorithm in the cortico-basal-ganglia-thalamic network is presented, and experimental predictions of our model are discussed.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1162/neco.2010.01-09-948 | DOI Listing |
Vitam Horm
January 2025
Department of Physiology, School of Medicine, Tehran University of Medical Sciences, Tehran, Iran. Electronic address:
Opioid use disorder (OUD) is considered a global health issue that affects various aspects of patients' lives and poses a considerable burden on society. Due to the high prevalence of remissions and relapses, novel therapeutic approaches are required to manage OUD. Deep brain stimulation (DBS) is one of the most promising clinical breakthroughs in translational neuroscience.
View Article and Find Full Text PDFNPJ Digit Med
January 2025
Graduate School of Data Science, Seoul National University, Seoul, Republic of Korea.
Polysomnography (PSG) is crucial for diagnosing sleep disorders, but manual scoring of PSG is time-consuming and subjective, leading to high variability. While machine-learning models have improved PSG scoring, their clinical use is hindered by the 'black-box' nature. In this study, we present SleepXViT, an automatic sleep staging system using Vision Transformer (ViT) that provides intuitive, consistent explanations by mimicking human 'visual scoring'.
View Article and Find Full Text PDFSci Rep
January 2025
School of Computer Science Engineering and Information Systems, Vellore Institute of Technology, Vellore, India.
This study introduces a novel ensemble learning technique namely Multi-Armed Bandit Ensemble (MAB-Ensemble), designed for lane detection in road images intended for autonomous vehicles. The foundation of the proposed MAB-Ensemble technique is inspired in terms of Multi-Armed bandit optimization to facilitate efficient model selection for lane segmentation. The benchmarking dataset namely TuSimple is used for training, validating and testing the proposed and existing lane detection techniques.
View Article and Find Full Text PDFTransl Psychiatry
January 2025
Division of Psychology, Department of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden.
Background: Alcohol use disorder (AUD) is associated with deficits in social cognition and behavior, but why these deficits are acquired is unknown. We hypothesized that a reduced association between actions and outcomes for others, i.e.
View Article and Find Full Text PDFISA Trans
January 2025
Toronto Metropolitan University, Toronto, Canada. Electronic address:
This research introduces an innovative approach to optimal control for a class of linear systems with input saturation. It leverages the synergy of Takagi-Sugeno (T-S) fuzzy models and reinforcement learning (RL) techniques. To enhance interpretability and analytical accessibility, our approach applies T-S models to approximate the value function and generate optimal control laws while incorporating prior knowledge.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!