Publications by authors named "Luntong Li"

Deep reinforcement learning (DRL) benefits from the representation power of deep neural networks (NNs), to approximate the value function and policy in the learning process. Batch reinforcement learning (BRL) benefits from stable training and data efficiency with fixed representation and enjoys solid theoretical analysis. This work proposes least-squares deep policy gradient (LSDPG), a hybrid approach that combines least-squares reinforcement learning (RL) with online DRL to achieve the best of both worlds.

View Article and Find Full Text PDF

Actor-critic (AC) learning control architecture has been regarded as an important framework for reinforcement learning (RL) with continuous states and actions. In order to improve learning efficiency and convergence property, previous works have been mainly devoted to solve regularization and feature learning problem in the policy evaluation. In this article, we propose a novel AC learning control method with regularization and feature selection for policy gradient estimation in the actor network.

View Article and Find Full Text PDF

Actor-critic based on the policy gradient (PG-based AC) methods have been widely studied to solve learning control problems. In order to increase the data efficiency of learning prediction in the critic of PG-based AC, studies on how to use recursive least-squares temporal difference (RLS-TD) algorithms for policy evaluation have been conducted in recent years. In such contexts, the critic RLS-TD evaluates an unknown mixed policy generated by a series of different actors, but not one fixed policy generated by the current actor.

View Article and Find Full Text PDF