AI Article Synopsis

  • Multistep tasks like block stacking are challenging for autonomous robots, requiring a combination of detailed motion control and higher-level symbolic planning.
  • Current reinforcement learning (RL) methods struggle with complex tasks that involve multiple intermediate steps and varied outcomes, often facing issues with exploration efficiency and sparse rewards.
  • The Universal Option Framework (UOF) addresses these challenges by training symbolic planning and kinematic control simultaneously, using techniques like auto-adjusting exploration and abstract demonstrations, resulting in more efficient and stable task performance with reduced memory usage.

Article Abstract

Multistep tasks, such as block stacking or parts (dis)assembly, are complex for autonomous robotic manipulation. A robotic system for such tasks would need to hierarchically combine motion control at a lower level and symbolic planning at a higher level. Recently, reinforcement learning (RL)-based methods have been shown to handle robotic motion control with better flexibility and generalizability. However, these methods have limited capability to handle such complex tasks involving planning and control with many intermediate steps over a long time horizon. First, current RL systems cannot achieve varied outcomes by planning over intermediate steps (e.g., stacking blocks in different orders). Second, the exploration efficiency of learning multistep tasks is low, especially when rewards are sparse. To address these limitations, we develop a unified hierarchical reinforcement learning framework, named Universal Option Framework (UOF), to enable the agent to learn varied outcomes in multistep tasks. To improve learning efficiency, we train both symbolic planning and kinematic control policies in parallel, aided by two proposed techniques: 1) an auto-adjusting exploration strategy (AAES) at the low level to stabilize the parallel training, and 2) abstract demonstrations at the high level to accelerate convergence. To evaluate its performance, we performed experiments on various multistep block-stacking tasks with blocks of different shapes and combinations and with different degrees of freedom for robot control. The results demonstrate that our method can accomplish multistep manipulation tasks more efficiently and stably, and with significantly less memory consumption.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TNNLS.2021.3059912DOI Listing

Publication Analysis

Top Keywords

reinforcement learning
12
multistep tasks
12
hierarchical reinforcement
8
robotic manipulation
8
motion control
8
symbolic planning
8
intermediate steps
8
varied outcomes
8
tasks
7
multistep
6

Similar Publications

Polysomnography (PSG) is crucial for diagnosing sleep disorders, but manual scoring of PSG is time-consuming and subjective, leading to high variability. While machine-learning models have improved PSG scoring, their clinical use is hindered by the 'black-box' nature. In this study, we present SleepXViT, an automatic sleep staging system using Vision Transformer (ViT) that provides intuitive, consistent explanations by mimicking human 'visual scoring'.

View Article and Find Full Text PDF

This study introduces a novel ensemble learning technique namely Multi-Armed Bandit Ensemble (MAB-Ensemble), designed for lane detection in road images intended for autonomous vehicles. The foundation of the proposed MAB-Ensemble technique is inspired in terms of Multi-Armed bandit optimization to facilitate efficient model selection for lane segmentation. The benchmarking dataset namely TuSimple is used for training, validating and testing the proposed and existing lane detection techniques.

View Article and Find Full Text PDF

Dissociating social reward learning and behavior in alcohol use disorder.

Transl Psychiatry

January 2025

Division of Psychology, Department of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden.

Background: Alcohol use disorder (AUD) is associated with deficits in social cognition and behavior, but why these deficits are acquired is unknown. We hypothesized that a reduced association between actions and outcomes for others, i.e.

View Article and Find Full Text PDF

This research introduces an innovative approach to optimal control for a class of linear systems with input saturation. It leverages the synergy of Takagi-Sugeno (T-S) fuzzy models and reinforcement learning (RL) techniques. To enhance interpretability and analytical accessibility, our approach applies T-S models to approximate the value function and generate optimal control laws while incorporating prior knowledge.

View Article and Find Full Text PDF

Humans are excellent at modifying our behaviour depending on context. For example, humans will change how they explore when losses are possible compared to when they are not possible. However, it remains unclear what specific cognitive and neural processes are modulated when exploring in different contexts.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!