Policy Correction and State-Conditioned Action Evaluation for Few-Shot Lifelong Deep Reinforcement Learning.

IEEE Trans Neural Netw Learn Syst

Published: April 2024

Lifelong deep reinforcement learning (DRL) approaches are commonly employed to adapt continuously to new tasks without forgetting previously acquired knowledge. While current lifelong DRL methods have shown promising advancements in retaining acquired knowledge, they suffer from significant adaptation efforts (i.e., longer training duration) and suboptimal policy when transferring to a new task that significantly deviates from previously learned tasks, a phenomenon known as the few-shot generalization challenge. In this work, we propose a generic approach that equips existing lifelong DRL methods with the capability of few-shot generalization. First, we employ selective experience reuse by leveraging the experience of encountered states, improving adaptation training for new tasks. Then, a relaxed softmax function is applied to the target Q values to improve the accuracy of evaluated Q values, leading to more optimal policies. Finally, we measure and reduce the discrepancy in data distribution between the policy and off-policy samples, resulting in improved adaptation efficiency. Extensive experiments have been conducted on three typical benchmarks to compare our approach with six representative lifelong DRL methods and two state-of-the-art (SOTA) few-shot DRL methods regarding their training speed, episode return, and average return of all episodes. Experimental results substantiate that our method improves the return of six lifelong DRL methods by at least 25%.

Download full-text PDF	Source
http://dx.doi.org/10.1109/TNNLS.2024.3385570	DOI Listing

Publication Analysis

Top Keywords

drl methods

lifelong drl

lifelong deep

deep reinforcement

reinforcement learning

acquired knowledge

few-shot generalization

lifelong

drl

methods

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!