We analyze the generalization properties of batch reinforcement learning (batch RL) with value function approximation from an information-theoretic perspective. We derive generalization bounds for batch RL using (conditional) mutual information. In addition, we demonstrate how to establish a connection between certain structural assumptions on the value function space and conditional mutual information. As a by-product, we derive a generalization bound via conditional mutual information, which was left open and may be of independent interest.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11593174 | PMC |
http://dx.doi.org/10.3390/e26110995 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!