Self-learning approaches, such as reinforcement learning, offer new possibilities for autonomous control of uncertain or time-varying systems. However, exploring an unknown environment under limited prediction capabilities is a challenge for a learning agent. If the environment is dangerous, free exploration can result in physical damage or in an otherwise unacceptable behavior. With respect to existing methods, the main contribution of this paper is the definition of a new approach that does not require global safety functions, nor specific formulations of the dynamics or of the environment, but relies on interval estimation of the dynamics of the agent during the exploration phase, assuming a limited capability of the agent to perceive the presence of incoming fatal states. Two algorithms are presented with this approach. The first is the Safety Handling Exploration with Risk Perception Algorithm (SHERPA), which provides safety by individuating temporary safety functions, called backups. SHERPA is shown in a simulated, simplified quadrotor task, for which dangerous states are avoided. The second algorithm, denominated OptiSHERPA, can safely handle more dynamically complex systems for which SHERPA is not sufficient through the use of safety metrics. An application of OptiSHERPA is simulated on an aircraft altitude control task.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TNNLS.2017.2654539 | DOI Listing |
NPJ Digit Med
January 2025
Graduate School of Data Science, Seoul National University, Seoul, Republic of Korea.
Polysomnography (PSG) is crucial for diagnosing sleep disorders, but manual scoring of PSG is time-consuming and subjective, leading to high variability. While machine-learning models have improved PSG scoring, their clinical use is hindered by the 'black-box' nature. In this study, we present SleepXViT, an automatic sleep staging system using Vision Transformer (ViT) that provides intuitive, consistent explanations by mimicking human 'visual scoring'.
View Article and Find Full Text PDFSci Rep
January 2025
School of Computer Science Engineering and Information Systems, Vellore Institute of Technology, Vellore, India.
This study introduces a novel ensemble learning technique namely Multi-Armed Bandit Ensemble (MAB-Ensemble), designed for lane detection in road images intended for autonomous vehicles. The foundation of the proposed MAB-Ensemble technique is inspired in terms of Multi-Armed bandit optimization to facilitate efficient model selection for lane segmentation. The benchmarking dataset namely TuSimple is used for training, validating and testing the proposed and existing lane detection techniques.
View Article and Find Full Text PDFTransl Psychiatry
January 2025
Division of Psychology, Department of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden.
Background: Alcohol use disorder (AUD) is associated with deficits in social cognition and behavior, but why these deficits are acquired is unknown. We hypothesized that a reduced association between actions and outcomes for others, i.e.
View Article and Find Full Text PDFISA Trans
January 2025
Toronto Metropolitan University, Toronto, Canada. Electronic address:
This research introduces an innovative approach to optimal control for a class of linear systems with input saturation. It leverages the synergy of Takagi-Sugeno (T-S) fuzzy models and reinforcement learning (RL) techniques. To enhance interpretability and analytical accessibility, our approach applies T-S models to approximate the value function and generate optimal control laws while incorporating prior knowledge.
View Article and Find Full Text PDFBrain Res
January 2025
Department of Computing Science, University of Alberta Edmonton Alberta Canada; Alberta Machine Intelligence Institute Edmonton Alberta Canada; Canada Institute for Advanced Research (CIFAR) AI Chair, Canada.
Humans are excellent at modifying our behaviour depending on context. For example, humans will change how they explore when losses are possible compared to when they are not possible. However, it remains unclear what specific cognitive and neural processes are modulated when exploring in different contexts.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!