Curious Explorer: a Provable Exploration Strategy in Policy Learning.

Marco Miani Maurizio Parton Marco Romito

IEEE Trans Pattern Anal Mach Intell

Published: September 2024

A coverage assumption is critical with policy gradient methods, because while the objective function is insensitive to updates in unlikely states, the agent may need improvements in those states to reach a nearly optimal payoff. However, this assumption can be unfeasible in certain environments, for instance in online learning, or when restarts are possible only from a fixed initial state. In these cases, classical policy gradient algorithms like REINFORCE can have poor convergence properties and sample efficiency. Curious Explorer is an iterative state space pure exploration strategy improving coverage of any restart distribution ρ. Using ρ and intrinsic rewards, Curious Explorer produces a sequence of policies, each one more exploratory than the previous one, and outputs a restart distribution with coverage based on the state visitation distribution of the exploratory policies. This paper main results are a theoretical upper bound on how often an optimal policy visits poorly visited states, and a bound on the error of the return obtained by REINFORCE without any coverage assumption. Finally, we conduct ablation studies with REINFORCE and TRPO in two hard-exploration tasks, to support the claim that Curious Explorer can improve the performance of very different policy gradient algorithms.

Download full-text PDF	Source
http://dx.doi.org/10.1109/TPAMI.2024.3460972	DOI Listing

Publication Analysis

Top Keywords

curious explorer

policy gradient

exploration strategy

coverage assumption

gradient algorithms

restart distribution

policy

curious

explorer provable

provable exploration

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!

A PHP Error was encountered