PAC Reinforcement Learning Algorithm for General-Sum Markov Games.

IEEE Trans Automat Contr

Department of Mechanical Engineering, University of Delaware, Newark, DE 19716 USA.

Published: May 2023

This paper presents a theoretical framework for probably approximately correct (PAC) multi-agent reinforcement learning (MARL) algorithms for Markov games. Using the idea of delayed Q-learning, the paper extends the well-known Nash Q-learning algorithm to build a new PAC MARL algorithm for general-sum Markov games. In addition to guiding the design of a provably PACMARL algorithm, the framework enables checking whether an arbitrary MARL algorithm is PAC. Comparative numerical results demonstrate the algorithm's performance and robustness.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10617487PMC
http://dx.doi.org/10.1109/tac.2022.3219340DOI Listing

Publication Analysis

Top Keywords

markov games
12
reinforcement learning
8
algorithm general-sum
8
general-sum markov
8
marl algorithm
8
algorithm
5
pac
4
pac reinforcement
4
learning algorithm
4
games paper
4

Similar Publications

An evolutionary game theory for event-driven ecological population dynamics.

Theory Biosci

January 2025

Faculty of Science and Engineering, Department of Biosciences, Swansea University, Singleton Park, Swansea, SA2 8PP, UK.

Despite being a powerful tool to model ecological interactions, traditional evolutionary game theory can still be largely improved in the context of population dynamics. One of the current challenges is to devise a cohesive theoretical framework for ecological games with density-dependent (or concentration-dependent) evolution, especially one defined by individual-level events. In this work, I use the notation of reaction networks as a foundation to propose a framework and show that classic two-strategy games are a particular case of the theory.

View Article and Find Full Text PDF

This paper systematically analyzes the spatiotemporal evolution trends and macroeconomic driving factors of farmland transfer at the provincial level in China since 2005, aiming to offer a new perspective for understanding the dynamic mechanisms of China's farmland transfer. Through the integrated use of kernel density estimation, the Markov model, and panel quantile regression methods, this study finds the following: (1) Farmland transfer rates across Chinese provinces show an overall upward trend, but regional differences exhibit a "U-shaped" evolution characterized by initially narrowing and then widening; (2) although provinces have relatively stable farmland transfer levels, there is potential for dynamic transitions; (3) factors such as per capita arable land, farmers' disposable income, the social security level, the urban‒rural income gap, the urbanization rate, government intervention, and the marketization level significantly promote farmland transfer, while inclusive finance inhibits transfer, and agricultural mechanization level and population aging have heterogeneous impacts. Therefore, to achieve convergence of low farmland transfer regions to medium levels while promoting medium-level regions to higher levels, it is recommended that the government increase support for agricultural mechanization, increase farmers' income and social security levels, and optimize marketization processes and government intervention strategies.

View Article and Find Full Text PDF

This article investigates the problem of robust decentralized load frequency control (LFC) in multiarea interconnection power systems, in the presence of the external disturbances and stochastic abrupt variations, such as component failures and different load demands. To capture the different operating conditions of the load in the multiarea power systems, a Markov superposition technique is skillfully employed to model the system's component matrices. To solve the Nash equilibrium solution of the zero-sum differential game problem, an improved online adaptive dynamic programming (ADP) algorithm based on the experience replay technique (ERT) is developed, which addresses the nonlinear coupling difficulties encountered in solving the game algebraic Riccati equations (Game AREs).

View Article and Find Full Text PDF
Article Synopsis
  • Reinforcement learning (RL) agents face vulnerabilities from adversaries that can negatively impact their performance and violate safety conditions, highlighting the difficulty of creating a policy that is both safe and robust.
  • Existing approaches typically isolate the issues of safety and robustness, but this text proposes a unified framework that combines both by employing constrained two-player zero-sum Markov games for effective policy learning.
  • The authors introduce a dual policy iteration scheme that optimizes both task and safety policies simultaneously, ensuring convergence to an optimal solution while addressing constraints from adversarial interference.
View Article and Find Full Text PDF

Spatial evolutionary games provide a valuable framework for elucidating the emergence and maintenance of cooperative behaviors. However, most previous studies assume that individuals are profiteers and neglect to consider the effects of memory. To bridge this gap, in this paper, we propose a memory-based spatial evolutionary game with dynamic interaction between learners and profiteers.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!