In sequential decision-making, imitation learning (IL) trains a policy efficiently by mimicking expert demonstrations. Various imitation methods were proposed and empirically evaluated, meanwhile, their theoretical understandings need further studies, among which the compounding error in long-horizon decisions is a major issue. In this paper, we first analyze the value gap between the expert policy and imitated policies by two imitation methods, behavioral cloning (BC) and generative adversarial imitation. The results support that generative adversarial imitation can reduce the compounding error compared to BC. Furthermore, we establish the lower bounds of IL under two settings, suggesting the significance of environment interactions in IL. By considering the environment transition model as a dual agent, IL can also be used to learn the environment model. Therefore, based on the bounds of imitating policies, we further analyze the performance of imitating environments. The results show that environment models can be more effectively imitated by generative adversarial imitation than BC. Particularly, we obtain a policy evaluation error that is linear with the effective planning horizon w.r.t. the model bias, suggesting a novel application of adversarial imitation for model-based reinforcement learning (MBRL). We hope these results could inspire future advances in IL and MBRL.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TPAMI.2021.3096966 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!