AI Article Synopsis

  • - The text discusses the challenge of the exploration-exploitation dilemma in off-policy reinforcement learning (RL), which negatively affects policy performance and sample efficiency.
  • - To address this, a new RL algorithm called Historical Decision-making Regularized Maximum Entropy (HDMRME) is introduced, aimed at better balancing exploration and exploitation while enhancing policy performance.
  • - The effectiveness of HDMRME is theoretically supported and experimentally validated through various tasks, showing it outperforms other leading RL algorithms in terms of sample efficiency and overall competitiveness.

Article Abstract

The challenge of the exploration-exploitation dilemma persists in off-policy reinforcement learning (RL) algorithms, impeding the improvement of policy performance and sample efficiency. To tackle this challenge, a novel historical decision-making regularized maximum entropy (HDMRME) RL algorithm is developed to strike the balance between exploration and exploitation. Built upon the maximum entropy RL framework, the historical decision-making regularization method is proposed to enhance the exploitation capability of RL policies. The theoretical analysis involves proving the convergence of HDMRME, investigating the tradeoff between exploration and exploitation of HDMRME, examining the disparity between the Q-function learned through HDMRME and the classic one, and analyzing the suboptimality of the trained policy. The performance of HDMRME is evaluated across various continuous-action control tasks from Mujoco and OpenAI Gym platforms. Comparative experiments demonstrate that HDMRME exhibits superior sample efficiency and achieves more competitive performance compared with other state-of-the-art RL algorithms.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TNNLS.2024.3481887DOI Listing

Publication Analysis

Top Keywords

historical decision-making
12
maximum entropy
12
decision-making regularized
8
regularized maximum
8
reinforcement learning
8
policy performance
8
sample efficiency
8
exploration exploitation
8
hdmrme
6
entropy reinforcement
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!