The Nash equilibrium is an important concept in game theory. It describes the least exploitability of one player from any opponents. We combine game theory, dynamic programming, and recent deep reinforcement learning (DRL) techniques to online learn the Nash equilibrium policy for two-player zero-sum Markov games (TZMGs). The problem is first formulated as a Bellman minimax equation, and generalized policy iteration (GPI) provides a double-loop iterative way to find the equilibrium. Then, neural networks are introduced to approximate Q functions for large-scale problems. An online minimax Q network learning algorithm is proposed to train the network with observations. Experience replay, dueling network, and double Q-learning are applied to improve the learning process. The contributions are twofold: 1) DRL techniques are combined with GPI to find the TZMG Nash equilibrium for the first time and 2) the convergence of the online learning algorithm with a lookup table and experience replay is proven, whose proof is not only useful for TZMGs but also instructive for single-agent Markov decision problems. Experiments on different examples validate the effectiveness of the proposed algorithm on TZMG problems.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TNNLS.2020.3041469 | DOI Listing |
J Environ Manage
December 2024
Department of Civil, Construction and Environmental Engineering, North Dakota State University, ND, United States.
The negative impacts of large hydroelectric reservoirs on downstream ecosystems have attracted worldwide attention. Few attempts have been made to dynamically predict ecological benefits and rationally negotiation in the reservoir-river-lake (RRL) system. This study addresses these gaps by developing an integrated framework with machine learning and game theory to balanced hydropower and ecological benefits.
View Article and Find Full Text PDFPeerJ
December 2024
Department of Mathematics and Applied Mathematics, Virginia Commonwealth University, Richmond, VA, United States of America.
Chaos
December 2024
School of Mathematics, Southeast University, Nanjing 210096, China.
The non-cooperation game with heterogeneous dynamics is of both theoretical significance and practical relevance because of its extensive penetration into various fields, such as the game confrontation composed of unmanned aerial vehicles and unmanned vehicles or the power generation systems with varied turbine assemblies. To solve such a game problem, this paper investigates distributed Nash equilibrium (NE) and generalized Nash equilibrium (GNE) seeking problems for heterogeneous multi-player systems in non-cooperation games. First, by incorporating the output regulation technique, a distributed NE seeking strategy is designed for heterogeneous multi-player games over undirected communication networks.
View Article and Find Full Text PDFAccid Anal Prev
December 2024
Shanghai Smart Vehicle Cooperating Innovation Center, Shanghai 201805, China. Electronic address:
Driver-automation shared steering control (SSC) has emerged as a promising technology for enhancing vehicle safety, but desire to achieve seamless collaboration between the driver and automation requires an in-depth understanding of driver steering behavior in interaction with automation. In this paper, we introduce a game-theoretic driver steering model with individual risk perception field generation. Firstly, a driver risk perception field is developed based on a novel concept of potential injury risk (PIR) to provide a quantitative estimation of the driver's perceived driving risk.
View Article and Find Full Text PDFSci Rep
December 2024
State Grid Xinjiang Electric Power Co., Ltd. Extra-High Voltage Branch Company, Urumqi, 830002, Xinjiang, China.
To accommodate China's electricity market reforms integrating medium and long-term (MLT) transactions and spot transactions, and to boost renewable energy consumption through the spot market, this paper proposes an optimized cross-provincial electricity trading strategy model based on a two-layer game framework. The proposed model incorporates an MLT green certificate contract decomposition method, enabling nested optimization of green certificate contracts and scheduling plans for cross-provincial power transactions. To encourage broader participation, a bilateral green certificate trading framework is established, which globally optimizes green certificate allocation to increase benefits for market participants.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!