[not in handout, see intranet]

Abler, B., Walter, H., Erk, S., Kammerer, H., & Spitzer, M. (2006). Prediction error as a linear function of reward probability is coded in human nucleus accumbens. Neuroimage, 31(2), 790-795.

Reward probability has been shown to be coded by dopamine neurons in monkeys. Phasic neuronal activation not only increased linearly with reward probability upon expectation of reward, but also varied monotonically across the range of probabilities upon omission or receipt of rewards, therefore modeling discrepancies between expected and received rewards. Such a discrete coding of prediction error has been suggested to be one of the basic principles of learning. We used functional magnetic resonance imaging (fMRI) to show that the human dopamine system codes reward probability and prediction error in a similar way. We used a simple delayed incentive task with a discrete range of reward probabilities from 0% to 100%. Activity in the nucleus accumbens of human subjects strongly resembled the phasic responses found in monkey neurons. First, during the expectation period of the task, the fMRI signal in the human nucleus accumbens (NAc) increased linearly with the probability of the reward. Second, during the outcome phase, activity in the NAc coded the prediction error as a linear function of reward probabilities. Third, we found that the Nac signal was correlated with individual differences in sensation seeking and novelty seeking, indicating a link between individual fMRI activation of the dopamine system in a probabilistic paradigm and personality traits previously suggested to be linked with reward processing. We therefore identify two different covariates that model activity in the Nac: specific properties of a psychological task and individual character traits. (c) 2006 Elsevier Inc. All rights reserved.

Beaver, J. D., Lawrence, A. D., Van Ditzhuijzen, J., Davis, M. H., Woods, A., & Calder, A. J. (2006). Individual differences in reward drive predict neural responses to images of food. Journal of Neuroscience, 26(19), 5160-5166.

A network of interconnected brain regions, including orbitofrontal, ventral striatal, amygdala, and midbrain areas, has been widely implicated in a number of aspects of food reward. However, in humans, sensitivity to reward can vary significantly from one person to the next. Individuals high in this trait experience more frequent and intense food cravings and are more likely to be overweight or develop eating disorders associated with excessive food intake. Using functional magnetic resonance imaging, we report that individual variation in trait reward sensitivity (as measured by the Behavioral Activation Scale) is highly correlated with activation to images of appetizing foods (e.g., chocolate cake, pizza) in a fronto-striatal-amygdala-midbrain network. Our findings demonstrate that there is considerable personality-linked variability in the neural response to food cues in healthy participants and provide important insight into the neurobiological factors underlying vulnerability to certain eating problems (e.g., hyperphagic obesity).

Daw, N. D., O'Doherty, J. P., Dayan, P., Seymour, B., & Dolan, R. J. (2006). Cortical substrates for exploratory decisions in humans. Nature, 441(7095), 876-879.

Decision making in an uncertain environment poses a conflict between the opposing demands of gathering and exploiting information. In a classic illustration of this 'exploration-exploitation' dilemma(1), a gambler choosing between multiple slot machines balances the desire to select what seems, on the basis of accumulated experience, the richest option, against the desire to choose a less familiar option that might turn out more advantageous ( and thereby provide information for improving future decisions). Far from representing idle curiosity, such exploration is often critical for organisms to discover how best to harvest resources such as food and water. In appetitive choice, substantial experimental evidence, underpinned by computational reinforcement learning(2) (RL) theory, indicates that a dopaminergic(3,4), striatal(5-9) and medial prefrontal network mediates learning to exploit. In contrast, although exploration has been well studied from both theoretical(1) and ethological(10) perspectives, its neural substrates are much less clear. Here we show, in a gambling task, that human subjects' choices can be characterized by a computationally well-regarded strategy for addressing the explore/exploit dilemma. Furthermore, using this characterization to classify decisions as exploratory or exploitative, we employ functional magnetic resonance imaging to show that the frontopolar cortex and intraparietal sulcus are preferentially active during exploratory decisions. In contrast, regions of striatum and ventromedial prefrontal cortex exhibit activity characteristic of an involvement in value-based exploitative decision making. The results suggest a model of action selection under uncertainty that involves switching between exploratory and exploitative behavioural modes, and provide a computationally precise characterization of the contribution of key decision-related brain systems to each of these functions.

Hampton, A. N., Bossaerts, P., & O'Doherty, J. P. (2006). The role of the ventromedial prefrontal cortex in abstract state-based inference during decision making in humans. Journal of Neuroscience, 26(32), 8360-8367.

Many real-life decision-making problems incorporate higher-order structure, involving interdependencies between different stimuli, actions, and subsequent rewards. It is not known whether brain regions implicated in decision making, such as the ventromedial prefrontal cortex (vmPFC), use a stored model of the task structure to guide choice (model-based decision making) or merely learn action or state values without assuming higher-order structure as in standard reinforcement learning. To discriminate between these possibilities, we scanned human subjects with functional magnetic resonance imaging while they performed a simple decision-making task with higher-order structure, probabilistic reversal learning. We found that neural activity in a key decision-making region, the vmPFC, was more consistent with a computational model that exploits higher-order structure than with simple reinforcement learning. These results suggest that brain regions, such as the vmPFC, use an abstract model of task structure to guide behavioral choice, computations that may underlie the human capacity for complex social interactions and abstract strategizing.

[not in handout, see intranet]
Kim, H., Shimojo, S., & O'Doherty, J. P. (2006). Is avoiding an aversive outcome rewarding? Neural substrates of avoidance learning in the human brain. Plos Biology, 4(8), 1453-1461.

Avoidance learning poses a challenge for reinforcement-based theories of instrumental conditioning, because once an aversive outcome is successfully avoided an individual may no longer experience extrinsic reinforcement for their behavior. One possible account for this is to propose that avoiding an aversive outcome is in itself a reward, and thus avoidance behavior is positively reinforced on each trial when the aversive outcome is successfully avoided. In the present study we aimed to test this possibility by determining whether avoidance of an aversive outcome recruits the same neural circuitry as that elicited by a reward itself. We scanned 16 human participants with functional MRI while they performed an instrumental choice task, in which on each trial they chose from one of two actions in order to either win money or else avoid losing money. Neural activity in a region previously implicated in encoding stimulus reward value, the medial orbitofrontal cortex, was found to increase, not only following receipt of reward, but also following successful avoidance of an aversive outcome. This neural signal may itself act as an intrinsic reward, thereby serving to reinforce actions during instrumental avoidance.

Montague, P. R., King-Casas, B., & Cohen, J. D. (2006). Imaging valuation models in human choice. Annual Review of Neuroscience, 29, 417-448.

To make a decision, a system must assign value to each of its available choices. In the human brain, one approach to studying valuation has used rewarding stimuli to map out brain responses by varying the dimension or importance of the rewards. However, theoretical models have taught us that value computations are complex, and so reward probes alone can give only partial information about neural responses related to valuation. In recent years, computationally principled models of value learning have been used in conjunction with noninvasive neuroimaging to tease out neural valuation responses related to reward-learning and decision-making. We restrict our review to the role of these models in a new generation of experiments that seeks to build on a now-large body of diverse reward-related brain responses. We show that the models and the measurements based on them point the way forward in two important directions: the valuation of time and the valuation of fictive experience.

O'Doherty, J. P., Buchanan, T. W., Seymour, B., & Dolan, R. J. (2006). Predictive neural coding of reward preference involves dissociable responses in human ventral midbrain and ventral striatum. Neuron, 49(1), 157-166.

Food preferences are acquired through experience and can exert strong influence on choice behavior. In order to choose which food to consume, it is necessary to maintain a predictive representation of the subjective value of the associated food stimulus. Here, we explore the neural mechanisms by which such predictive representations are learned through classical conditioning. Human subjects were scanned using fMRI while learning associations between arbitrary visual stimuli and subsequent delivery of one of five different food flavors. Using a temporal difference algorithm to model learning, we found predictive responses in the ventral midbrain and a part of ventral striatum (ventral putamen) that were related directly to subjects' actual behavioral preferences. These brain structures demonstrated divergent response profiles, with the ventral midbrain showing a linear response profile with preference, and the ventral striatum a bivalent response. These results provide insight into the neural mechanisms underlying human preference behavior.

Pessiglione, M., Seymour, B., Flandin, G., Dolan, R. J., & Frith, C. D. (2006). Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans. Nature, 442(7106), 1042-1045.

Theories of instrumental learning are centred on understanding how success and failure are used to improve future decisions(1). These theories highlight a central role for reward prediction errors in updating the values associated with available actions(2). In animals, substantial evidence indicates that the neurotransmitter dopamine might have a key function in this type of learning, through its ability to modulate cortico-striatal synaptic efficacy(3). However, no direct evidence links dopamine, striatal activity and behavioural choice in humans. Here we show that, during instrumental learning, the magnitude of reward prediction error expressed in the striatum is modulated by the administration of drugs enhancing (3,4-dihydroxy-L-phenylalanine; L-DOPA) or reducing ( haloperidol) dopaminergic function. Accordingly, subjects treated with L-DOPA have a greater propensity to choose the most rewarding action relative to subjects treated with haloperidol. Furthermore, incorporating the magnitude of the prediction errors into a standard action-value learning algorithm accurately reproduced subjects' behavioural choices under the different drug conditions. We conclude that dopamine-dependent modulation of striatal activity can account for how the human brain uses reward prediction errors to improve future decisions.

Preuschoff, K., Bossaerts, P., & Quartz, S. R. (2006). Neural differentiation of expected reward and risk in human subcortical structures. Neuron, 51(3), 381-390.

In decision-making under uncertainty, economic studies emphasize the importance of risk in addition to expected reward. Studies in neuroscience focus on expected reward and learning rather than risk. We combined functional imaging with a simple gambling task to vary expected reward and risk simultaneously and in an uncorrelated manner. Drawing on financial decision theory, we modeled expected reward as mathematical expectation of reward, and risk as reward variance. Activations in dopaminoceptive structures correlated with both mathematical parameters. These activations differentiated spatially and temporally. Temporally, the activation related to expected reward was immediate, while the activation related to risk was delayed. Analyses confirmed that our paradigm minimized confounds from learning, motivation, and salience. These results suggest that the primary task of the dopaminergic system is to convey signals of upcoming stochastic rewards, such as expected reward and risk, beyond its role in learning, motivation, and salience.

Rodriguez, P. F., Aron, A. R., & Poldrack, R. A. (2006). Ventral-striatal/nucleus-accumbens sensitivity to prediction errors during classification learning. Human Brain Mapping, 27(4), 306-313.

A prominent theory in neuroscience suggests reward learning is driven by the discrepancy between a subject's expectation of an outcome and the actual outcome itself. Furthermore, it is postulated that midbrain dopamine neurons relay this mismatch to target regions including the ventral striatum. Using functional MRI (fMRI), we tested striatal responses to prediction errors for probabilistic classification learning with purely cognitive feedback. We used a version of the Rescorla-Wagner model to generate prediction errors for each subject and then entered these in a parametric analysis of fMRI activity. Activation in ventral striatum/nucleus-accumbens (Nacc) increased parametrically with prediction error for negative feedback. This result extends recent neuroimaging findings in reward learning by showing that learning with cognitive feedback also depends on the same circuitry and dopaminergic signaling mechanisms.

Tobler, P. N., O'Doherty, J. P., Dolan, R. J., & Schultz, W. (2006). Human neural learning depends on reward prediction errors in the blocking paradigm. Journal of Neurophysiology, 95(1), 301-310.

Human neural learning depends on reward prediction errors in the blocking paradigm. J Neurophysiol 95: 301-310, 2006. First published September 28, 2005; doi: 10.1152/jn. 00762.2005. Learning occurs when an outcome deviates from expectation ( prediction error). According to formal learning theory, the defining paradigm demonstrating the role of prediction errors in learning is the blocking test. Here, a novel stimulus is blocked from learning when it is associated with a fully predicted outcome, presumably because the occurrence of the outcome fails to produce a prediction error. We investigated the role of prediction errors in human reward-directed learning using a blocking paradigm and measured brain activation with functional magnetic resonance imaging. Participants showed blocking of behavioral learning with juice rewards as predicted by learning theory. The medial orbitofrontal cortex and the ventral putamen showed significantly lower responses to blocked, compared with nonblocked, reward-predicting stimuli. In reward-predicting control situations, deactivations in orbitofrontal cortex and ventral putamen occurred at the time of unpredicted reward omissions. Responses in discrete parts of orbitofrontal cortex correlated with the degree of behavioral learning during, and after, the learning phase. These data suggest that learning in primary reward structures in the human brain correlates with prediction errors in a manner that complies with principles of formal learning theory.

Yacubian, J., Glascher, J., Schroeder, K., Sommer, T., Braus, D. F., & Buchel, C. (2006). Dissociable systems for gain- and loss-related value predictions and errors of prediction in the human brain. Journal of Neuroscience, 26(37), 9530-9537.

Midbrain dopaminergic neurons projecting to the ventral striatum code for reward magnitude and probability during reward anticipation and then indicate the difference between actual and predicted outcome. It has been questioned whether such a common system for the prediction and evaluation of reward exists in humans. Using functional magnetic resonance imaging and a guessing task in two large cohorts, we are able to confirm ventral striatal responses coding both reward probability and magnitude during anticipation, permitting the local computation of expected value (EV). However, the ventral striatum only represented the gain-related part of EV (EV+). At reward delivery, the same area shows a reward probability and magnitude-dependent prediction error signal, best modeled as the difference between actual outcome and EV+. In contrast, loss-related expected value (EV-) and the associated prediction error was represented in the amygdala. Thus, the ventral striatum and the amygdala distinctively process the value of a prediction and subsequently compute a prediction error for gains and losses, respectively. Therefore, a homeostatic balance of both systems might be important for generating adequate expectations under uncertainty. Prevalence of either part might render expectations more positive or negative, which could contribute to the pathophysiology of mood disorders like major depression.