IEEE Trans Neural Netw Learn Syst
January 2024
We focus on learning the zero-constraint-violation safe policy in model-free reinforcement learning (RL). Existing model-free RL studies mostly use the posterior penalty to penalize dangerous actions, which means they must experience the danger to learn from the danger. Therefore, they cannot learn a zero-violation safe policy even after convergence.
View Article and Find Full Text PDF