This article studies how to learn approximate Nash equilibrium (NE) from static historical datasets by empirical game-theoretic analysis (EGTA), which provides a simulation-based framework to model complex multiagent interactions. Generally, EGTA requires plentiful interactions with the environment or simulator to estimate a cogent and tractable game model approximating the underlying game. However, these exploratory interactions often suffer from low data utilization efficiency and may not be feasible in risk-sensitive applications. To address these problems, this article investigates a new EGTA paradigm for offline settings and introduces a novel algorithm called conservative offline policy space response oracle (COPSRO) to identify NE from fixed datasets without active data collection. COPSRO initiates by extracting a set of strategies from the offline dataset to construct an overcomplete strategy population, achieving an approximation to the policy space of the original game. Then, COPSRO integrates the conservative critic (CC) to tackle the challenge of overestimation inherent in offline learning scenarios. Additionally, it devises the offline NE solver to iteratively compute approximate NE. Consequently, COPSRO can ascertain equilibrium strategies without real-world interaction, markedly enhancing its utility in risk-averse settings. This article provides both theoretical analysis and empirical evaluation to demonstrate the effectiveness and superiority of COPSRO across various real-world tasks in the offline setting. Our method surpasses existing approaches in terms of convergence and exploitability, especially when the coverage ration of dataset is low (20% or 10%).
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TNNLS.2024.3454477 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!