Episodic Memory-Double Actor-Critic Twin Delayed Deep Deterministic Policy Gradient.

Man Shu Shuai Lü Xiaoyu Gong Daolong An Songlin Li

Neural Netw

Key Laboratory of Symbolic Computation and Knowledge Engineering (Jilin University), Ministry of Education, Changchun 130012, China; College of Computer Science and Technology, Jilin University, Changchun 130012, China. Electronic address:

Published: February 2025

Existing deep reinforcement learning (DRL) algorithms suffer from the problem of low sample efficiency. Episodic memory allows DRL algorithms to remember and use past experiences with high return, thereby improving sample efficiency. However, due to the high dimensionality of the state-action space in continuous action tasks, previous methods in continuous action tasks often only utilize the information stored in episodic memory, rather than directly employing episodic memory for action selection as done in discrete action tasks. We suppose that episodic memory retains the potential to guide action selection in continuous control tasks. Our objective is to enhance sample efficiency by leveraging episodic memory for action selection in such tasks-either reducing the number of training steps required to achieve comparable performance or enabling the agent to obtain higher rewards within the same number of training steps. To this end, we propose an "Episodic Memory-Double Actor-Critic (EMDAC)" framework, which can use episodic memory for action selection in continuous action tasks. The critics and episodic memory evaluate the value of state-action pairs selected by the two actors to determine the final action. Meanwhile, we design an episodic memory based on a Kalman filter optimizer, which updates using the episodic rewards of collected state-action pairs. The Kalman filter optimizer assigns different weights to experiences collected at different time periods during the memory update process. In our episodic memory, state-action pair clusters are used as indices, recording both the occurrence frequency of these clusters and the value estimates for the corresponding state-action pairs. This enables the estimation of the value of state-action pair clusters by querying the episodic memory. After that, we design intrinsic reward based on the novelty of state-action pairs with episodic memory, defined by the occurrence frequency of state-action pair clusters, to enhance the exploration capability of the agent. Ultimately, we propose an "EMDAC-TD3" algorithm by applying this three modules to Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm within an Actor-Critic framework. Through evaluations in MuJoCo environments within the OpenAI Gym domain, EMDAC-TD3 achieves higher sample efficiency compared to baseline algorithms. EMDAC-TD3 demonstrates superior final performance compared to state-of-the-art episodic control algorithms and advanced Actor-Critic algorithms, by comparing the final rewards, Median, Interquartile Mean, Mean, and Optimality Gap. The final rewards can directly demonstrate the advantages of the algorithms. Based on the final rewards, EMDAC-TD3 achieves an average performance improvement of 11.01% over TD3, surpassing the current state-of-the-art algorithms in the same category.

Download full-text PDF	Source
http://dx.doi.org/10.1016/j.neunet.2025.107286	DOI Listing

Publication Analysis

Top Keywords

episodic memory

sample efficiency

action tasks

action selection

state-action pairs

episodic

memory

continuous action

memory action

state-action pair

Similar Publications

Hemispheric asymmetries in episodic memory.

Handb Clin Neurol

March 2025

Laboratory of Neuropsychology of Memory, IRCSS Santa Lucia Foundation, Rome, Italy; Department of Systems Medicine, Tor Vergata University, Rome, Italy. Electronic address:

Gian Daniele Zannino Giovanni Augusto Carlesimo

The term "episodic memory" refers to our ability to remember past personal experiences. This ability is severely disrupted following bilateral damage to a dedicated neural substrate located symmetrically in the mesial temporal lobes. Milder deficits are also observed following unilateral damage to the same structures.

View Article and Find Full Text PDF

Similar Publications

Reward prediction-error promotes the neural encoding of episodic learning.

Neuropsychologia

March 2025

Department of Psychology, Institute of Education, China West Normal University, Nanchong 637002.

Fangfang Liu Yingjie Jiang Bin Du

Reward prediction-error carries significant implications for learning, facilitating the process by influencing prior knowledge and shaping future expectations and decisions. However, the electrophysiological mechanism through which reward prediction-error impacts learning remains incompletely understood. This study aimed to investigate the neural characteristics of reward prediction-error and its effect on recognition memory using Event-Related Potentials (ERPs).

View Article and Find Full Text PDF

Similar Publications

Emotional events induce retrograde memory impairments on conceptually-related neutral events.

Cognition

March 2025

Department of Psychology, McGill University, Canada. Electronic address:

Jamie Snytte Ting Ting Liu Renée Withnell M Natasha Rajah Signy Sheldon

Emotional events are known to be prioritized during episodic encoding, leading to more detailed recollections compared to neutral events. Encoding an emotional event can influence the mnemonic fate of preceding or subsequent neutral events. Studies examining the impact of emotion on memory for neighboring neutral events have produced inconsistent results, which could be due to differences in the conceptual association between emotional and neutral stimuli.

View Article and Find Full Text PDF

Similar Publications

Efficacy of digital therapeutic applications for cognitive training among older adults with mild cognitive impairment or dementia: A systematic review and network meta-analysis of randomized controlled trials.

Psychiatry Res

March 2025

Department of Biomedical Informatics, College of Medicine, Konyang University, Daejeon, 35365, Republic of Korea; Konyang Medical data Research group-KYMERA, Konyang University Hospital, Daejeon, Republic of Korea; Myunggok Medical Research Center, Konyang University Hospital, Daejeon, Republic of Korea. Electronic address:

Hye Su Jeong Yeo Wool Lee Taeho Greg Rhee Sung Ryul Shim

Various digital therapeutics (DTx), which utilize computerized cognitive training (CCT) to improve cognitive functioning, have been tested and released. However, the efficacy of these DTx approaches may be diverse. This study aims to meta-synthesize the associations between mobile applications and cognitive functioning outcomes in older adults with mild cognitive impairment (MCI) or dementia from randomized controlled trials (RCTs).

View Article and Find Full Text PDF

Similar Publications

The cognitive mechanisms of prospective memory in children with hearing impairment aged 13 to 16.

Psychol Res

March 2025

School of Education, Central China Normal University, Wuhan, China.

Xing Jin Jianghua Lei

This paper explores the cognitive mechanisms of prospective memory in children with hearing impairment through two studies. Study 1, based on questionnaire results, indicates that children with hearing impairment score higher on prospective memory tasks compared to typically developing children. Study 2, derived from experimental outcomes, reveals that children with hearing impairment perform worse on both event-based and time-based prospective memory tasks than their typical hearing peers, with time-based prospective memory showing a more pronounced deficit.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!