AI Article Synopsis

  • The article introduces a model-free λ-policy iteration (λ-PI) for solving the discrete-time linear quadratic regulation (LQR) problem using novel matrix operators.
  • The λ-PI algorithm improves convergence rates over traditional policy iteration and value iteration methods, as it doesn't require a suitable initial policy.
  • Off-policy reinforcement learning extensions of λ-PI are developed, demonstrating robustness to probing noise, with simulations confirming the algorithm's effectiveness.

Article Abstract

This article presents a model-free λ -policy iteration ( λ -PI) for the discrete-time linear quadratic regulation (LQR) problem. To solve the algebraic Riccati equation arising from solving the LQR in an iterative manner, we define two novel matrix operators, named the weighted Bellman operator and the composite Bellman operator. Then, the λ -PI algorithm is first designed as a recursion with the weighted Bellman operator, and its equivalent formulation as a fixed-point iteration with the composite Bellman operator is shown. The contraction and monotonic properties of the composite Bellman operator guarantee the convergence of the λ -PI algorithm. In contrast to the PI algorithm, the λ -PI does not require an admissible initial policy, and the convergence rate outperforms the value iteration (VI) algorithm. Model-free extension of the λ -PI algorithm is developed using the off-policy reinforcement learning technique. It is also shown that the off-policy variants of the λ -PI algorithm are robust against the probing noise. Finally, simulation examples are conducted to validate the efficacy of the λ -PI algorithm.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TNNLS.2021.3098985DOI Listing

Publication Analysis

Top Keywords

bellman operator
20
-pi algorithm
20
composite bellman
12
discrete-time linear
8
linear quadratic
8
quadratic regulation
8
weighted bellman
8
-pi
7
algorithm
7
bellman
5

Similar Publications

Improving robustness by action correction via multi-step maximum risk estimation.

Neural Netw

December 2024

School of Computer Science and Technology, Soochow University, Suzhou, 215006, China. Electronic address:

Certifying robustness against external uncertainties throughout the control process to reduce the risk of instability is very important. Most existing approaches based on adversarial learning use a fixed parameter to adjust the intensity of adversarial perturbations and design these perturbations in a greedy manner without considering future implications. However, they often lead to severe vulnerabilities when attack budgets vary dynamically or under foresighted attacks.

View Article and Find Full Text PDF

In this article, an optimal surrounding control algorithm is proposed for multiple unmanned surface vessels (USVs), in which actor-critic reinforcement learning (RL) is utilized to optimize the merging process. Specifically, the multiple-USV optimal surrounding control problem is first transformed into the Hamilton-Jacobi-Bellman (HJB) equation, which is difficult to solve due to its nonlinearity. An adaptive actor-critic RL control paradigm is then proposed to obtain the optimal surround strategy, wherein the Bellman residual error is utilized to construct the network update laws.

View Article and Find Full Text PDF
Article Synopsis
  • Offline reinforcement learning (RL) faces challenges due to training vulnerability from policy deviation, leading to suboptimal learned policies.
  • The article introduces the de-pessimism (DEP) operator for more accurate Q-value estimation, using the optimal Bellman operator and a compensation operator to address challenges with out-of-distribution actions.
  • The new method, integrated into the soft actor-critic algorithm as DoRL-VC, demonstrates significant performance improvements in various tasks, showcasing DEP's effectiveness in reducing pessimism in offline RL.
View Article and Find Full Text PDF
Article Synopsis
  • Nivestym, a biosimilar to Neupogen, is being evaluated for its effectiveness in mobilizing peripheral blood stem cells (PBSC) in healthy donors for allogeneic hematopoietic stem cell transplantation (allo-HSCT).
  • A retrospective study analyzed data from 541 donors who received either Nivestym or Neupogen, focusing on factors like donor age, weight, and various counts of cells before and after treatment.
  • Results showed that while Nivestym was generally as effective as Neupogen for PBSC mobilization, younger donors (under 35) had a slightly lower CD34 cell count after using Nivestym compared to those using Neupogen.
View Article and Find Full Text PDF

Despite theoretical benefits of collaborative robots, disappointing outcomes are well documented by clinical studies, spanning rehabilitation, prostheses, and surgery. Cognitive load theory provides a possible explanation for why humans in the real world are not realizing the benefits of collaborative robots: high cognitive loads may be impeding human performance. Measuring cognitive availability using an electrocardiogram, we ask 25 participants to complete a virtual-reality task alongside an invisible agent that determines optimal performance by iteratively updating the Bellman equation.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!