Off-Policy Actor-Critic methods can effectively exploit past experiences and thus they have achieved great success in various reinforcement learning tasks. In many image-based and multi-agent tasks, attention mechanism has been employed in Actor-Critic methods to improve their sampling efficiency. In this paper, we propose a meta attention method for state-based reinforcement learning tasks, which combines attention mechanism and meta-learning based on the Off-Policy Actor-Critic framework. Unlike previous attention-based work, our meta attention method introduces attention in the Actor and the Critic of the typical Actor-Critic framework, rather than in multiple pixels of an image or multiple information sources in specific image-based control tasks or multi-agent systems. In contrast to existing meta-learning methods, the proposed meta-attention approach is able to function in both the gradient-based training phase and the agent's decision-making process. The experimental results demonstrate the superiority of our meta-attention method in various continuous control tasks, which are based on the Off-Policy Actor-Critic methods including DDPG and TD3.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.neunet.2023.03.024 | DOI Listing |
IEEE Trans Neural Netw Learn Syst
March 2024
Making proper decision online in complex environment during the blast furnace (BF) operation is a key factor in achieving long-term success and profitability in the steel manufacturing industry. Regulatory lags, ore source uncertainty, and continuous decision requirement make it a challenging task. Recently, reinforcement learning (RL) has demonstrated state-of-the-art performance in various sequential decision-making problems.
View Article and Find Full Text PDFIEEE Trans Neural Netw Learn Syst
November 2023
Learning distributed cooperative policies for large-scale multirobot systems remains a challenging task in the multiagent reinforcement learning (MARL) context. In this work, we model the interactions among the robots as a graph and propose a novel off-policy actor-critic MARL algorithm to train distributed coordination policies on the graph by leveraging the ability of information extraction of graph neural networks (GNNs). First, a new type of Gaussian policy parameterized by the GNNs is designed for distributed decision-making in continuous action spaces.
View Article and Find Full Text PDFIn offline actor-critic (AC) algorithms, the distributional shift between the training data and target policy causes optimistic Q value estimates for out-of-distribution (OOD) actions. This leads to learned policies skewed toward OOD actions with falsely high Q values. The existing value-regularized offline AC algorithms address this issue by learning a conservative value function, leading to a performance drop.
View Article and Find Full Text PDFBiomimetics (Basel)
July 2023
Shanghai Jiao Tong University and Chiba University International Cooperative Research Center (SJTU-CU ICRC), 800 Dongchuan Road, Minhang District, Shanghai 200240, China.
Neural Netw
June 2023
National University of Defense Technology, College of Computer & Hefei Interdisciplinary Center, Changsha, 410073, Hunan, China.
Off-Policy Actor-Critic methods can effectively exploit past experiences and thus they have achieved great success in various reinforcement learning tasks. In many image-based and multi-agent tasks, attention mechanism has been employed in Actor-Critic methods to improve their sampling efficiency. In this paper, we propose a meta attention method for state-based reinforcement learning tasks, which combines attention mechanism and meta-learning based on the Off-Policy Actor-Critic framework.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!