Actor-Critic Alignment for Offline-to-Online Reinforcement Learning.

Proc Mach Learn Res

Department of Computer Science, University of Illinois Chicago, Chicago, IL 60607, USA.

Published: July 2023

Deep offline reinforcement learning has recently demonstrated considerable promises in leveraging offline datasets, providing high-quality models that significantly reduce the online interactions required for fine-tuning. However, such a benefit is often diminished due to the marked state-action distribution shift, which causes significant bootstrap error and wipes out the good initial policy Existing solutions resort to constraining the policy shift or balancing the sample replay based on their online-ness. However, they require online estimation of distribution divergence or density ratio. To avoid such complications, we propose deviating from existing actor-critic approaches that directly transfer the state-action value functions. Instead, we post-process them by aligning with the offline learned policy, so that the -values for actions outside the offline policy are also tamed. As a result, the online fine-tuning can be simply performed as in the standard actor-critic algorithms. We show empirically that the proposed method improves the performance of the fine-tuned robotic agents on various simulated tasks.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11232493PMC

Publication Analysis

Top Keywords

reinforcement learning
8
actor-critic alignment
4
alignment offline-to-online
4
offline-to-online reinforcement
4
learning deep
4
offline
4
deep offline
4
offline reinforcement
4
learning demonstrated
4
demonstrated considerable
4

Similar Publications

The recent COVID-19 pandemic offers a rare opportunity to understand how citizens attribute responsibility for governments' responses to unanticipated negative-and in this case, systemic-exogenous shocks. Classical accounts of responsibility are complicated when crises are pervasive, involve multiple valence dimensions, and where individuals can make relative assessments of performance. We fielded a conjoint experiment in 16 countries with 22,147 respondents.

View Article and Find Full Text PDF

The ability to extinguish contextual fear in a changing environment is crucial for animal survival. Recent data support the role of the thalamic nucleus reuniens (RE) and its projections to the dorsal hippocampal CA1 area (RE→dCA1) in this process. However, it remains poorly understood how RE impacts dCA1 neurons during contextual fear extinction (CFE).

View Article and Find Full Text PDF

3D disordered fibrous network structures (3D-DFNS), such as cytoskeletons, collagen matrices, and spider webs, exhibit remarkable material efficiency, lightweight properties, and mechanical adaptability. Despite their widespread in nature, the integration into engineered materials is limited by the lack of study on their complex architectures. This study addresses the challenge by investigating the structure-property relationships and stability of biomimetic 3D-DFNS using large datasets generated through procedural modeling, coarse-grained molecular dynamics simulations, and machine learning.

View Article and Find Full Text PDF

Imaginal exposure is a standard procedure of cognitive behavioral therapy for the treatment of anxiety and panic disorders. It is often used when in vivo exposure is not possible, too stressful for patients, or would be too expensive. The Bio-Informational Theory implies that imaginal exposure is effective because of the perceptual proximity of mental imagery to real events, whereas empirical findings suggest that propositional thought of fear stimuli (i.

View Article and Find Full Text PDF

Constraint programming is known for being an efficient approach to solving combinatorial problems. Important design choices in a solver are the , designed to lead the search to the best solutions in a minimum amount of time. However, developing these heuristics is a time-consuming process that requires problem-specific expertise.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!