Humans can learn under a wide variety of feedback conditions. Reinforcement learning (RL), where a series of rewarded decisions must be made, is a particularly important type of learning. Computational and behavioral studies of RL have focused mainly on Markovian decision processes, where the next state depends on only the current state and action. Little is known about non-Markovian decision making, where the next state depends on more than the current state and action. Learning is non-Markovian, for example, when there is no unique mapping between actions and feedback. We have produced a model based on spiking neurons that can handle these non-Markovian conditions by performing policy gradient descent [1]. Here, we examine the model's performance and compare it with human learning and a Bayes optimal reference, which provides an upper-bound on performance. We find that in all cases, our population of spiking neurons model well-describes human performance.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4405578 | PMC |
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0123105 | PLOS |
Nat Commun
January 2025
Department of Physics and Astronomy, and Institute of Applied Physics, Seoul National University, Seoul, Korea.
Addressing and mitigating decoherence sources plays an essential role in the development of a scalable quantum computing system, which requires low gate errors to be consistently maintained throughout the circuit execution. While nuclear spin-free materials, such as isotopically purified silicon, exhibit intrinsically promising coherence properties for electron spin qubits, the omnipresent charge noise, when converted to magnetic noise under a strong magnetic field gradient, often hinders stable qubit operation within a time frame comparable to the data acquisition time. Here, we demonstrate both open- and closed-loop suppression techniques for the transduced noise in silicon spin qubits, resulting in a more than two-fold (ten-fold) improvement of the inhomogeneous coherence time (Rabi oscillation quality) that leads to a single-qubit gate fidelity of over 99.
View Article and Find Full Text PDFCurr Biol
January 2025
Institute of Neuroscience, Key Laboratory of Brain Cognition and Brain-inspired Intelligence Technology, Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai 200031, China. Electronic address:
Human intelligence is characterized by the remarkable ability to solve complex problems by planning a sequence of actions that takes us from an initial state to a desired goal state. Quantifying and comparing problem-solving capabilities across species and finding their evolutionary roots are critical for understanding how the brain carries out this intricate process. We introduce the Language of Problem Solving (LoPS) model as a novel quantitative framework that investigates the structure of problem-solving behavior through a language model.
View Article and Find Full Text PDFPhys Chem Chem Phys
December 2024
Zhejiang Laboratory, Hangzhou 311100, China.
J Chem Theory Comput
November 2024
Shanghai Frontiers Science Center of Artificial Intelligence and Deep Learning, NYU Shanghai, 567 West Yangsi Road, Shanghai 200124, China.
Nonadiabatic dynamics is key for understanding solar energy conversion and photochemical processes in condensed phases. This often involves the non-Markovian dynamics of the reduced density matrix in open quantum systems, where knowledge of the system's prior states is necessary to predict its future behavior. In this study, we explore time-series machine learning methods for predicting long-time nonadiabatic dynamics based on short-time input data, comparing these methods with the physics-based transfer tensor method (TTM).
View Article and Find Full Text PDFPhys Rev Lett
October 2024
Laboratoire de Physique Théorique de la Matière Condensée, CNRS/Sorbonne Université, 4 Place Jussieu, 75005 Paris, France.
Self-interacting random walks (SIRWs) show long-range memory effects that result from the interaction of the random walker at time t with the territory already visited at earlier times t^{'}
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!