The Nash equilibrium is an important concept in game theory. It describes the least exploitability of one player from any opponents. We combine game theory, dynamic programming, and recent deep reinforcement learning (DRL) techniques to online learn the Nash equilibrium policy for two-player zero-sum Markov games (TZMGs). The problem is first formulated as a Bellman minimax equation, and generalized policy iteration (GPI) provides a double-loop iterative way to find the equilibrium. Then, neural networks are introduced to approximate Q functions for large-scale problems. An online minimax Q network learning algorithm is proposed to train the network with observations. Experience replay, dueling network, and double Q-learning are applied to improve the learning process. The contributions are twofold: 1) DRL techniques are combined with GPI to find the TZMG Nash equilibrium for the first time and 2) the convergence of the online learning algorithm with a lookup table and experience replay is proven, whose proof is not only useful for TZMGs but also instructive for single-agent Markov decision problems. Experiments on different examples validate the effectiveness of the proposed algorithm on TZMG problems.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TNNLS.2020.3041469DOI Listing

Publication Analysis

Top Keywords

nash equilibrium
12
online minimax
8
minimax network
8
network learning
8
two-player zero-sum
8
zero-sum markov
8
markov games
8
game theory
8
drl techniques
8
learning algorithm
8

Similar Publications

The negative impacts of large hydroelectric reservoirs on downstream ecosystems have attracted worldwide attention. Few attempts have been made to dynamically predict ecological benefits and rationally negotiation in the reservoir-river-lake (RRL) system. This study addresses these gaps by developing an integrated framework with machine learning and game theory to balanced hydropower and ecological benefits.

View Article and Find Full Text PDF

Mathematical model of voluntary vaccination against schistosomiasis.

PeerJ

December 2024

Department of Mathematics and Applied Mathematics, Virginia Commonwealth University, Richmond, VA, United States of America.

Article Synopsis
  • Human schistosomiasis, caused by Schistosoma worms, is a neglected tropical disease prevalent in sub-Saharan Africa, currently lacking a vaccine despite ongoing development.
  • The study improves a compartmental model of schistosomiasis by adding human behavior and voluntary vaccination factors, highlighting that effective herd immunity requires specific vaccination rates.
  • Results show that unless vaccination costs are low, voluntary vaccination alone may not sufficiently lower disease prevalence below 1%, emphasizing the need for affordable vaccine access to achieve public health goals.
View Article and Find Full Text PDF

The non-cooperation game with heterogeneous dynamics is of both theoretical significance and practical relevance because of its extensive penetration into various fields, such as the game confrontation composed of unmanned aerial vehicles and unmanned vehicles or the power generation systems with varied turbine assemblies. To solve such a game problem, this paper investigates distributed Nash equilibrium (NE) and generalized Nash equilibrium (GNE) seeking problems for heterogeneous multi-player systems in non-cooperation games. First, by incorporating the output regulation technique, a distributed NE seeking strategy is designed for heterogeneous multi-player games over undirected communication networks.

View Article and Find Full Text PDF

Driver-automation shared steering control (SSC) has emerged as a promising technology for enhancing vehicle safety, but desire to achieve seamless collaboration between the driver and automation requires an in-depth understanding of driver steering behavior in interaction with automation. In this paper, we introduce a game-theoretic driver steering model with individual risk perception field generation. Firstly, a driver risk perception field is developed based on a novel concept of potential injury risk (PIR) to provide a quantitative estimation of the driver's perceived driving risk.

View Article and Find Full Text PDF

Research on cross-provincial power trading strategy considering the medium and long-term trading plan.

Sci Rep

December 2024

State Grid Xinjiang Electric Power Co., Ltd. Extra-High Voltage Branch Company, Urumqi, 830002, Xinjiang, China.

To accommodate China's electricity market reforms integrating medium and long-term (MLT) transactions and spot transactions, and to boost renewable energy consumption through the spot market, this paper proposes an optimized cross-provincial electricity trading strategy model based on a two-layer game framework. The proposed model incorporates an MLT green certificate contract decomposition method, enabling nested optimization of green certificate contracts and scheduling plans for cross-provincial power transactions. To encourage broader participation, a bilateral green certificate trading framework is established, which globally optimizes green certificate allocation to increase benefits for market participants.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!