Lifelong deep reinforcement learning (DRL) approaches are commonly employed to adapt continuously to new tasks without forgetting previously acquired knowledge. While current lifelong DRL methods have shown promising advancements in retaining acquired knowledge, they suffer from significant adaptation efforts (i.e., longer training duration) and suboptimal policy when transferring to a new task that significantly deviates from previously learned tasks, a phenomenon known as the few-shot generalization challenge. In this work, we propose a generic approach that equips existing lifelong DRL methods with the capability of few-shot generalization. First, we employ selective experience reuse by leveraging the experience of encountered states, improving adaptation training for new tasks. Then, a relaxed softmax function is applied to the target Q values to improve the accuracy of evaluated Q values, leading to more optimal policies. Finally, we measure and reduce the discrepancy in data distribution between the policy and off-policy samples, resulting in improved adaptation efficiency. Extensive experiments have been conducted on three typical benchmarks to compare our approach with six representative lifelong DRL methods and two state-of-the-art (SOTA) few-shot DRL methods regarding their training speed, episode return, and average return of all episodes. Experimental results substantiate that our method improves the return of six lifelong DRL methods by at least 25%.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TNNLS.2024.3385570DOI Listing

Publication Analysis

Top Keywords

drl methods
20
lifelong drl
16
lifelong deep
8
deep reinforcement
8
reinforcement learning
8
acquired knowledge
8
few-shot generalization
8
lifelong
6
drl
6
methods
5

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!