Spike-based decision learning of Nash equilibria in two-player games.

PLoS Comput Biol

Department of Physiology and Center for Cognition, Learning and Memory, University of Bern, Switzerland.

Published: January 2013

Humans and animals face decision tasks in an uncertain multi-agent environment where an agent's strategy may change in time due to the co-adaptation of others strategies. The neuronal substrate and the computational algorithms underlying such adaptive decision making, however, is largely unknown. We propose a population coding model of spiking neurons with a policy gradient procedure that successfully acquires optimal strategies for classical game-theoretical tasks. The suggested population reinforcement learning reproduces data from human behavioral experiments for the blackjack and the inspector game. It performs optimally according to a pure (deterministic) and mixed (stochastic) Nash equilibrium, respectively. In contrast, temporal-difference(TD)-learning, covariance-learning, and basic reinforcement learning fail to perform optimally for the stochastic strategy. Spike-based population reinforcement learning, shown to follow the stochastic reward gradient, is therefore a viable candidate to explain automated decision learning of a Nash equilibrium in two-player games.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3459907PMC
http://dx.doi.org/10.1371/journal.pcbi.1002691DOI Listing

Publication Analysis

Top Keywords

reinforcement learning
12
decision learning
8
learning nash
8
two-player games
8
population reinforcement
8
nash equilibrium
8
learning
5
spike-based decision
4
nash equilibria
4
equilibria two-player
4

Similar Publications

Recent evidence highlights that monetary rewards can increase the precision at which healthy human volunteers can detect small changes in the intensity of thermal noxious stimuli, contradicting the idea that rewards exert a broad inhibiting influence on pain perception. This effect was stronger with contingent rewards compared with noncontingent rewards, suggesting a successful learning process. In the present study, we implemented a model comparison approach that aimed to improve our understanding of the mechanisms that underlie thermal noxious discrimination in humans.

View Article and Find Full Text PDF

Transitive inference, the ability to establish hierarchical relationships between stimuli, is typically tested by training with premise pairs (e.g., A + B-, B + C-, C + D-, D + E-), which establishes a stimulus hierarchy (A > B > C > D > E).

View Article and Find Full Text PDF

Recent research has highlighted a notable confidence bias in the haptic sense, yet its impact on learning relative to other senses remains unexplored. This online study investigated learning behaviour across visual, auditory, and haptic modalities using a probabilistic selection task on computers and mobile devices, employing dynamic and ecologically valid stimuli to enhance generalisability. We analysed reaction time as an indicator of confidence, alongside learning speed and task accuracy.

View Article and Find Full Text PDF

The power of belief? Evidence of reduced fear extinction learning in Catholic God believers.

Front Public Health

January 2025

Dipartimento di Scienze Cognitive, Psicologiche, Pedagogiche e Degli Studi Culturali, Università di Messina, Messina, Italy.

Religious beliefs can shape how people process fear. Yet the psychophysiological mechanisms underlying this phenomenon remain poorly understood. We investigated fear learning and extinction processes in a group of individuals who professed a belief in God, compared to non-believers.

View Article and Find Full Text PDF

Those with diabetes mellitus are at high-risk of developing psychiatric disorders, especially mood disorders, yet the link between hyperglycemia and altered motivation has not been thoroughly explored. Here, we characterized value-based decision-making behavior of a streptozocin-induced diabetic mouse model on Restaurant Row, a naturalistic neuroeconomic foraging paradigm capable of behaviorally capturing multiple decision systems known to depend on dissociable neural circuits. Mice made self-paced choices on a daily limited time-budget, accepting or rejecting reward offers based on cost (delays cued by tone pitch) and subjective value (flavors), in a closed-economy system tested across months.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!