Animals can adapt their preferences for different types of reward according to physiological state, such as hunger or thirst. To explain this ability, we employ a simple multi-objective reinforcement learning model that learns multiple values according to different reward dimensions such as food or water. We show that by weighting these learned values according to the current needs, behaviour may be flexibly adapted to present preferences.
View Article and Find Full Text PDFIt has been suggested that the brain employs probabilistic generative models to optimally interpret sensory information. This hypothesis has been formalised in distinct frameworks, focusing on explaining separate phenomena. On one hand, classic predictive coding theory proposed how the probabilistic models can be learned by networks of neurons employing local synaptic plasticity.
View Article and Find Full Text PDFPerceptual decisions should depend on sensory evidence. However, such decisions are also influenced by past choices and outcomes. These choice history biases may reflect advantageous strategies to exploit temporal regularities of natural environments.
View Article and Find Full Text PDFThe optimal way to make decisions in many circumstances is to track the difference in evidence collected in favor of the options. The drift diffusion model (DDM) implements this approach and provides an excellent account of decisions and response times. However, existing DDM-based models of confidence exhibit certain deficits, and many theories of confidence have used alternative, nonoptimal models of decisions.
View Article and Find Full Text PDFPhase-amplitude coupling (PAC), the coupling of the amplitude of a faster brain rhythm to the phase of a slower brain rhythm, plays a significant role in brain activity and has been implicated in various neurological disorders. For example, in Parkinson's disease, PAC between the beta (13-30 Hz) and gamma (30-100 Hz) rhythms in the motor cortex is exaggerated, while in Alzheimer's disease, PAC between the theta (4-8 Hz) and gamma rhythms is diminished. Modulating PAC (i.
View Article and Find Full Text PDFWhen facing an unfamiliar environment, animals need to explore to gain new knowledge about which actions provide reward, but also put the newly acquired knowledge to use as quickly as possible. Optimal reinforcement learning strategies should therefore assess the uncertainties of these action-reward associations and utilise them to inform decision making. We propose a novel model whereby direct and indirect striatal pathways act together to estimate both the mean and variance of reward distributions, and mesolimbic dopaminergic neurons provide transient novelty signals, facilitating effective uncertainty-driven exploration.
View Article and Find Full Text PDFFor both humans and machines, the essence of learning is to pinpoint which components in its information processing pipeline are responsible for an error in its output, a challenge that is known as 'credit assignment'. It has long been assumed that credit assignment is best solved by backpropagation, which is also the foundation of modern machine learning. Here, we set out a fundamentally different principle on credit assignment called 'prospective configuration'.
View Article and Find Full Text PDFObjectives: The exact mechanisms of deep brain stimulation (DBS) are still an active area of investigation, in spite of its clinical successes. This is due in part to the lack of understanding of the effects of stimulation on neuronal rhythms. Entrainment of brain oscillations has been hypothesised as a potential mechanism of neuromodulation.
View Article and Find Full Text PDFAdapting actions to changing goals and environments is central to intelligent behavior. There is evidence that the basal ganglia play a crucial role in reinforcing or adapting actions depending on their outcome. However, the corresponding electrophysiological correlates in the basal ganglia and the extent to which these causally contribute to action adaptation in humans is unclear.
View Article and Find Full Text PDFAdv Neural Inf Process Syst
November 2022
Training with backpropagation (BP) in standard deep learning consists of two main steps: a forward pass that maps a data point to its prediction, and a backward pass that propagates the error of this prediction back through the network. This process is highly effective when the goal is to minimize a specific objective function. However, it does not allow training on networks with cyclic or backward connections.
View Article and Find Full Text PDFIntroduction: We assess risks differently when they are explicitly described, compared to when we learn directly from experience, suggesting dissociable decision-making systems. Our needs, such as hunger, could globally affect our risk preferences, but do they affect described and learned risks equally? On one hand, decision-making from descriptions is often considered flexible and context sensitive, and might therefore be modulated by metabolic needs. On the other hand, preferences learned through reinforcement might be more strongly coupled to biological drives.
View Article and Find Full Text PDFWhile brain stimulation therapies such as deep brain stimulation for Parkinson's disease (PD) can be effective, they have yet to reach their full potential across neurological disorders. Entraining neuronal rhythms using rhythmic brain stimulation has been suggested as a new therapeutic mechanism to restore neurotypical behaviour in conditions such as chronic pain, depression, and Alzheimer's disease. However, theoretical and experimental evidence indicate that brain stimulation can also entrain neuronal rhythms at sub- and super-harmonics, far from the stimulation frequency.
View Article and Find Full Text PDFForming accurate memory of sequential stimuli is a fundamental function of biological agents. However, the computational mechanism underlying sequential memory in the brain remains unclear. Inspired by neuroscience theories and recent successes in applying predictive coding (PC) to memory tasks, in this work we propose a novel PC-based model for memory, called (tPC).
View Article and Find Full Text PDFTo optimally adjust our behavior to changing environments we need to both adjust the speed of our decisions and movements. Yet little is known about the extent to which these processes are controlled by common or separate mechanisms. Furthermore, while previous evidence from computational models and empirical studies suggests that the basal ganglia play an important role during adjustments of decision-making, it remains unclear how this is implemented.
View Article and Find Full Text PDFPurpose: Despite the improvement in treatment and prognosis of primary central nervous system lymphoma (PCNSL) over the last decades, the 5-year survival rate is approximately 30%; thus, new therapeutic approaches are needed to improve patient survival. The study's aim was to evaluate the role of surgical resection of PCNSL.
Methods: Primary outcomes were the overall survival (OS) and progression-free survival (PFS) of patients with PCNSL who underwent surgical resection versus biopsy alone.
Associative memories in the brain receive and store patterns of activity registered by the sensory neurons, and are able to retrieve them when necessary. Due to their importance in human intelligence, computational models of associative memories have been developed for several decades now. In this paper, we present a novel neural model for realizing associative memories, which is based on a hierarchical generative network that receives external stimuli via sensory neurons.
View Article and Find Full Text PDFTo accurately predict rewards associated with states or actions, the variability of observations has to be taken into account. In particular, when the observations are noisy, the individual rewards should have less influence on tracking of average reward, and the estimate of the mean reward should be updated to a smaller extent after each observation. However, it is not known how the magnitude of the observation noise might be tracked and used to control prediction updates in the brain reward system.
View Article and Find Full Text PDFMaking accurate decisions often involves the integration of current and past evidence. Here, we examine the neural correlates of conflict and evidence integration during sequential decision-making. Female and male human patients implanted with deep-brain stimulation (DBS) electrodes and age-matched and gender-matched healthy controls performed an expanded judgment task, in which they were free to choose how many cues to sample.
View Article and Find Full Text PDFIntroduction: Clinical depression is usually treated in primary care with psychological therapies and antidepressant medication. However, when patients do not respond to at least two or more antidepressants within a depressive episode, they are considered to have treatment resistant depression (TRD). Previous small randomised controlled trials suggested that pramipexole, a dopamine D2/3 receptor agonist, may be effective for treating patients with unipolar and bipolar depression as it is known to influence motivational drive and reward processing.
View Article and Find Full Text PDFReinforcement learning involves updating estimates of the value of states and actions on the basis of experience. Previous work has shown that in humans, reinforcement learning exhibits a confirmatory bias: when the value of a chosen option is being updated, estimates are revised more radically following positive than negative reward prediction errors, but the converse is observed when updating the unchosen option value estimate. Here, we simulate performance on a multi-arm bandit task to examine the consequences of a confirmatory bias for reward harvesting.
View Article and Find Full Text PDF