Meta attention for Off-Policy Actor-Critic.

Neural Netw

National University of Defense Technology, College of Computer & Hefei Interdisciplinary Center, Changsha, 410073, Hunan, China.

Published: June 2023

Off-Policy Actor-Critic methods can effectively exploit past experiences and thus they have achieved great success in various reinforcement learning tasks. In many image-based and multi-agent tasks, attention mechanism has been employed in Actor-Critic methods to improve their sampling efficiency. In this paper, we propose a meta attention method for state-based reinforcement learning tasks, which combines attention mechanism and meta-learning based on the Off-Policy Actor-Critic framework. Unlike previous attention-based work, our meta attention method introduces attention in the Actor and the Critic of the typical Actor-Critic framework, rather than in multiple pixels of an image or multiple information sources in specific image-based control tasks or multi-agent systems. In contrast to existing meta-learning methods, the proposed meta-attention approach is able to function in both the gradient-based training phase and the agent's decision-making process. The experimental results demonstrate the superiority of our meta-attention method in various continuous control tasks, which are based on the Off-Policy Actor-Critic methods including DDPG and TD3.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.neunet.2023.03.024DOI Listing

Publication Analysis

Top Keywords

off-policy actor-critic
16
meta attention
12
actor-critic methods
12
reinforcement learning
8
learning tasks
8
attention mechanism
8
attention method
8
based off-policy
8
actor-critic framework
8
control tasks
8

Similar Publications

Making proper decision online in complex environment during the blast furnace (BF) operation is a key factor in achieving long-term success and profitability in the steel manufacturing industry. Regulatory lags, ore source uncertainty, and continuous decision requirement make it a challenging task. Recently, reinforcement learning (RL) has demonstrated state-of-the-art performance in various sequential decision-making problems.

View Article and Find Full Text PDF

Learning distributed cooperative policies for large-scale multirobot systems remains a challenging task in the multiagent reinforcement learning (MARL) context. In this work, we model the interactions among the robots as a graph and propose a novel off-policy actor-critic MARL algorithm to train distributed coordination policies on the graph by leveraging the ability of information extraction of graph neural networks (GNNs). First, a new type of Gaussian policy parameterized by the GNNs is designed for distributed decision-making in continuous action spaces.

View Article and Find Full Text PDF

In offline actor-critic (AC) algorithms, the distributional shift between the training data and target policy causes optimistic Q value estimates for out-of-distribution (OOD) actions. This leads to learned policies skewed toward OOD actions with falsely high Q values. The existing value-regularized offline AC algorithms address this issue by learning a conservative value function, leading to a performance drop.

View Article and Find Full Text PDF
Article Synopsis
  • - Flying insects maintain stability and control while flapping their wings, even when faced with strong winds and turbulence, which conventional controllers often struggle with.
  • - The study introduces a new type of controller for bumblebee hovering, using deep reinforcement learning to enhance stability during large disturbances.
  • - A detailed simulation environment was created to test this controller, which proved effective in quickly stabilizing the bumblebee's flight, showing potential for bio-inspired drone designs.
View Article and Find Full Text PDF

Meta attention for Off-Policy Actor-Critic.

Neural Netw

June 2023

National University of Defense Technology, College of Computer & Hefei Interdisciplinary Center, Changsha, 410073, Hunan, China.

Off-Policy Actor-Critic methods can effectively exploit past experiences and thus they have achieved great success in various reinforcement learning tasks. In many image-based and multi-agent tasks, attention mechanism has been employed in Actor-Critic methods to improve their sampling efficiency. In this paper, we propose a meta attention method for state-based reinforcement learning tasks, which combines attention mechanism and meta-learning based on the Off-Policy Actor-Critic framework.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!