Publications by authors named "Tianheng Song"

Actor-critic (AC) learning control architecture has been regarded as an important framework for reinforcement learning (RL) with continuous states and actions. In order to improve learning efficiency and convergence property, previous works have been mainly devoted to solve regularization and feature learning problem in the policy evaluation. In this article, we propose a novel AC learning control method with regularization and feature selection for policy gradient estimation in the actor network.

View Article and Find Full Text PDF

Since the late 1980s, temporal difference (TD) learning has dominated the research area of policy evaluation algorithms. However, the demand for the avoidance of TD defects, such as low data-efficiency and divergence in off-policy learning, has inspired the studies of a large number of novel TD-based approaches. Gradient-based and least-squares-based algorithms comprise the major part of these new approaches.

View Article and Find Full Text PDF

Actor-critic based on the policy gradient (PG-based AC) methods have been widely studied to solve learning control problems. In order to increase the data efficiency of learning prediction in the critic of PG-based AC, studies on how to use recursive least-squares temporal difference (RLS-TD) algorithms for policy evaluation have been conducted in recent years. In such contexts, the critic RLS-TD evaluates an unknown mixed policy generated by a series of different actors, but not one fixed policy generated by the current actor.

View Article and Find Full Text PDF

A least squares temporal difference with gradient correction (LS-TDC) algorithm and its kernel-based version kernel-based LS-TDC (KLS-TDC) are proposed as policy evaluation algorithms for reinforcement learning (RL). LS-TDC is derived from the TDC algorithm. Attributed to TDC derived by minimizing the mean-square projected Bellman error, LS-TDC has better convergence performance.

View Article and Find Full Text PDF