Publications by authors named "Sifa Zheng"

We focus on learning the zero-constraint-violation safe policy in model-free reinforcement learning (RL). Existing model-free RL studies mostly use the posterior penalty to penalize dangerous actions, which means they must experience the danger to learn from the danger. Therefore, they cannot learn a zero-violation safe policy even after convergence.

View Article and Find Full Text PDF