We present a framework to integrate tensor network (TN) methods with reinforcement learning (RL) for solving dynamical optimization tasks. We consider the RL actor-critic method, a model-free approach for solving RL problems, and introduce TNs as the approximators for its policy and value functions. Our "actor-critic with tensor networks" (ACTeN) method is especially well suited to problems with large and factorizable state and action spaces. As an illustration of the applicability of ACTeN we solve the exponentially hard task of sampling rare trajectories in two paradigmatic stochastic models, the East model of glasses and the asymmetric simple exclusion process, the latter being particularly challenging to other methods due to the absence of detailed balance. With substantial potential for further integration with the vast array of existing RL methods, the approach introduced here is promising both for applications in physics and to multi-agent RL problems more generally.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1103/PhysRevLett.132.197301 | DOI Listing |
J Microbiol Biol Educ
January 2025
Department of Microbiology, University of Georgia, Athens, Georgia, USA.
We present a laboratory module that uses isolation of antibiotic-resistant bacteria from locally collected stream water samples to introduce undergraduate students to basic microbiological culture-based and molecular techniques. This module also educates them on the global public health threat of antibiotic-resistant organisms. Through eight laboratory sessions, students are involved in quality testing of water sources from their neighborhoods, followed by isolation of extended-spectrum beta-lactamase-producing .
View Article and Find Full Text PDFBrain Behav
January 2025
Faculty of Health Sciences, Child Development Department, Hacettepe University, Ankara, Turkey.
Purpose: This research aims to identify the problems and needs of families of children with reading difficulties, develop an Integrated Process-Based Family Education Program (IPMD-F) to address these needs, and implement it.
Methods: The study used a community-based participatory action research approach, following a four-stage process: general information collection, needs identification and action plan creation, development and implementation of the IPMD-F, and evaluation. Conducted during the 2023-2024 academic year in Ankara, Turkey, with 16 volunteer parents of children diagnosed with learning disabilities, data were collected using qualitative and quantitative tools.
The design of the illumination optics for high numerical aperture (NA) anamorphic extreme ultraviolet (EUV) projection optics is a critical challenge to EUV lithography in advanced technology node. However, the EUV illumination optics design using conventional methods have flaws in illumination efficiency and illumination uniformity due to the limitations of relay configuration and matching method that can only consider one factor affecting illumination uniformity. One-mirror configuration can improve illumination efficiency by reducing the number of mirrors.
View Article and Find Full Text PDFCommun Biol
January 2025
Institute of Automation, Chinese Academy of Sciences, Beijing, China.
Whether working memory (WM) is encoded by persistent activity using attractors or by dynamic activity using transient trajectories has been debated for decades in both experimental and modeling studies, and a consensus has not been reached. Even though many recurrent neural networks (RNNs) have been proposed to simulate WM, most networks are designed to match respective experimental observations and show either transient or persistent activities. Those few which consider networks with both activity patterns have not attempted to directly compare their memory capabilities.
View Article and Find Full Text PDFTransl Psychiatry
January 2025
School of Chinese Medicine, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong, China.
Recreational use of nitrous oxide (NO) has risen dramatically over the past decades. This study aimed to examine its rewarding effect and the underlying mechanisms. The exposure of mice to a subanesthetic concentration (20%) of NO for 30 min for 4 consecutive days paired with NO in the morning and paired with the air in the afternoon produced apparent rewarding behavior in the conditioned place preference (CPP) paradigm.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!