Abstract
Decisions may arise via 'model-free' repetition of previously reinforced actions or by 'model-based' evaluation, which is widely thought to follow from prospective anticipation of action consequences using a learned map or model. While choices and neural correlates of decision variables sometimes reflect knowledge of their consequences, it remains unclear whether this actually arises from prospective evaluation. Using functional magnetic resonance imaging and a sequential reward-learning task in which paths contained decodable object categories, we found that humans' model-based choices were associated with neural signatures of future paths observed at decision time, suggesting a prospective mechanism for choice. Prospection also covaried with the degree of model-based influences on neural correlates of decision variables and was inversely related to prediction error signals thought to underlie model-free learning. These results dissociate separate mechanisms underlying model-based and model-free evaluation and support the hypothesis that model-based influences on choices and neural decision variables result from prospection.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout





Similar content being viewed by others
References
Thorndike, E.L. Animal Intelligence: Experimental Studies (Macmillan, New York, 1911).
Sutton, R.S. & Barto, A.G. Introduction to Reinforcement Learning 〈http://dl.acm.org/citation.cfm?id=551283〉 (MIT Press, 1998).
Tolman, E.C. Cognitive maps in rats and men. Psychol. Rev. 55, 189–208 (1948).
Shohamy, D. & Wagner, A.D. Integrating memories in the human brain: hippocampal-midbrain encoding of overlapping events. Neuron 60, 378–389 (2008).
Wimmer, G.E. & Shohamy, D. Preference by association: how memory mechanisms in the hippocampus bias decisions. Science 338, 270–273 (2012).
Barron, H.C., Dolan, R.J. & Behrens, T.E.J. Online evaluation of novel choices by simultaneous representation of multiple memories. Nat. Neurosci. 16, 1492–1498 (2013).
Doll, B.B., Simon, D.A. & Daw, N.D. The ubiquity of model-based reinforcement learning. Curr. Opin. Neurobiol. 22, 1075–1081 (2012).
Dolan, R.J. & Dayan, P. Goals and habits in the brain. Neuron 80, 312–325 (2013).
Doya, K. What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? Neural Netw. 12, 961–974 (1999).
Fermin, A., Yoshida, T., Ito, M., Yoshimoto, J. & Doya, K. Evidence for model-based action planning in a sequential finger movement task. J. Mot. Behav. 42, 371–379 (2010).
Gläscher, J., Daw, N., Dayan, P. & O'Doherty, J.P. States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron 66, 585–595 (2010).
Daw, N.D., Gershman, S.J., Seymour, B., Dayan, P. & Dolan, R.J. Model-based influences on humans' choices and striatal prediction errors. Neuron 69, 1204–1215 (2011).
Eppinger, B., Walter, M., Heekeren, H.R. & Li, S.-C. Of goals and habits: age-related and individual differences in goal-directed decision-making. Front. Neurosci. 7, 253 (2013).
Pfeiffer, B.E. & Foster, D.J. Hippocampal place-cell sequences depict future paths to remembered goals. Nature 497, 74–79 (2013).
Johnson, A. & Redish, A.D. Neural ensembles in CA3 transiently encode paths forward of the animal at a decision point. J. Neurosci. 27, 12176–12189 (2007).
Schapiro, A.C., Kustner, L.V. & Turk-Browne, N.B. Shaping of object representations in the human medial temporal lobe based on temporal regularities. Curr. Biol. 22, 1622–1627 (2012).
Moore, A.W. & Atkeson, C.G. Prioritized sweeping: reinforcement learning with less data and less time. Mach. Learn. 13, 103–130 (1993).
Sutton, R.S. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. Machine Learning: Proc. Seventh Int. Conf. on Machine Learning (eds. Porter, B.W. & Mooney, R.J.) 216–224 (Morgan Kaufmann, Palo Alto, California, USA, 1990).
Daw, N.D. & Dayan, P. The algorithmic anatomy of model-based evaluation. Philos. Trans. R. Soc. Lond. B Biol. Sci. 369, 20130478 (2014).
Zeithamova, D., Dominick, A.L. & Preston, A.R. Hippocampal and ventral medial prefrontal activation during retrieval-mediated learning supports novel inference. Neuron 75, 168–179 (2012).
Gershman, S.J., Markman, A.B. & Otto, A.R. Retrospective revaluation in sequential decision making: a tale of two systems. J. Exp. Psychol. Gen. 143, 182–194 (2014).
Doll, B.B., Shohamy, D. & Daw, N.D. Multiple memory systems as substrates for multiple decision systems. Neurobiol. Learn. Mem. 117, 4–13 (2015).
Lee, S.W., Shimojo, S. & O'Doherty, J.P. Neural computations underlying arbitration between model-based and model-free learning. Neuron 81, 687–699 (2014).
Reddy, L. & Kanwisher, N. Coding of visual objects in the ventral stream. Curr. Opin. Neurobiol. 16, 408–414 (2006).
FitzGerald, T.H.B., Seymour, B. & Dolan, R.J. The role of human orbitofrontal cortex in value comparison for incommensurable objects. J. Neurosci. 29, 8388–8395 (2009).
Boorman, E.D., Behrens, T.E.J., Woolrich, M.W. & Rushworth, M.F.S. How green is the grass on the other side? Frontopolar cortex and the evidence in favor of alternative courses of action. Neuron 62, 733–743 (2009).
Daw, N.D., O'Doherty, J.P., Dayan, P., Seymour, B. & Dolan, R.J. Cortical substrates for exploratory decisions in humans. Nature 441, 876–879 (2006).
Boorman, E.D., Behrens, T.E. & Rushworth, M.F. Counterfactual choice and learning in a neural network centered on human lateral frontopolar cortex. PLoS Biol. 9, e1001093 (2011).
Kolling, N., Behrens, T.E.J., Mars, R.B. & Rushworth, M.F.S. Neural mechanisms of foraging. Science 336, 95–98 (2012).
Shenhav, A., Straccia, M.A., Cohen, J.D. & Botvinick, M.M. Anterior cingulate engagement in a foraging context reflects choice difficulty, not foraging value. Nat. Neurosci. 17, 1249–1254 (2014).
Garrison, J., Erdeniz, B. & Done, J. Prediction error in reinforcement learning: a meta-analysis of neuroimaging studies. Neurosci. Biobehav. Rev. 37, 1297–1310 (2013).
Foerde, K., Knowlton, B.J. & Poldrack, R.A. Modulation of competing memory systems by distraction. Proc. Natl. Acad. Sci. USA 103, 11778–11783 (2006).
Tricomi, E., Balleine, B.W. & O'Doherty, J.P. A specific role for posterior dorsolateral striatum in human habit learning. Eur. J. Neurosci. 29, 2225–2232 (2009).
Wunderlich, K., Dayan, P. & Dolan, R.J. Mapping value based planning and extensively trained choice in the human brain. Nat. Neurosci. 15, 786–791 (2012).
Kurth-Nelson, Z., Barnes, G., Sejdinovic, D., Dolan, R. & Dayan, P. Temporal structure in associative retrieval. Elife 4, e04919 (2015).
Tolman, E.C. & Honzik, C.H. Introduction and removal of reward, and maze performance in rats. Univ. Calif. Publ. Psychol. 4, 257–275 (1930).
Daw, N.D., Niv, Y. & Dayan, P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8, 1704–1711 (2005).
Dayan, P. Improving generalization for temporal difference learning: the successor representation. Neural Comput. 5, 613–624 (1993).
Botvinick, M. & Weinstein, A. Model-based hierarchical reinforcement learning and human action control. Philos. Trans. R. Soc. Lond. B Biol. Sci. 369, 20130480 (2014).
Schapiro, A.C., Rogers, T.T., Cordova, N.I., Turk-Browne, N.B. & Botvinick, M.M. Neural representations of events arise from temporal community structure. Nat. Neurosci. 16, 486–492 (2013).
Gluck, M.A. & Myers, C.E. Hippocampal mediation of stimulus representation: a computational theory. Hippocampus 3, 491–516 (1993).
Badre, D., Kayser, A.S. & D'Esposito, M. Frontal cortex and the discovery of abstract action rules. Neuron 66, 315–326 (2010).
Botvinick, M.M., Niv, Y. & Barto, A.C. Hierarchically organized behavior and its neural foundations: a reinforcement learning perspective. Cognition 113, 262–280 (2009).
Simon, D.A. & Daw, N.D. Neural correlates of forward planning in a spatial decision task in humans. J. Neurosci. 31, 5526–5539 (2011).
Everitt, B.J. & Robbins, T.W. Neural systems of reinforcement for drug addiction: from actions to habits to compulsion. Nat. Neurosci. 8, 1481–1489 (2005).
Redish, A.D. Addiction as a computational process gone awry. Science 306, 1944–1947 (2004).
Voon, V. et al. Mechanisms underlying dopamine-mediated reward bias in compulsive behaviors. Neuron 65, 135–142 (2010).
Otto, A.R., Gershman, S.J., Markman, A.B. & Daw, N.D. The curse of planning: dissecting multiple reinforcement-learning systems by taxing the central executive. Psychol. Sci. 24, 751–761 (2013).
Akaike, H. A new look at the statistical model identification. IEEE Trans. Automat. Contr. 19, 716–723 (1974).
Daw, N.D. in Atten. Perform. XXIII (Delgado, M.R., Phelps, E.A. & Robbins, T.W.) 1–26 (Oxford University Press, 2011).
Acknowledgements
We thank S.M. Fleming and L.Y. Atlas for helpful discussions. This work was supported by NINDS grant R01NS078784.
Author information
Authors and Affiliations
Contributions
All authors designed the experiment and analyses. B.B.D. and K.D.D. performed the experiment. B.B.D. analyzed the data. B.B.D., N.D.D. and D.S. wrote the paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Integrated supplementary information
Supplementary Figure 1 Inferior frontal gyrus activation and model-free behavior
Relationship between inferior frontal gyrus (IFG) activation and model-free behavior (Online Methods, GLM4). A prospective model-based learner is indifferent to changes in start states, facing the same prospective problem on each trial. In contrast, a model-free learner who maintains a separate set of expected values for each start state may face additional processing demands (e.g., retrieval) when start states change. To test this possibility, we sought regions where such a switch cost might be reflected in the BOLD response, via greater activation when start states differed from one trial to the next relative to when they remained the same. a. Contrast of task start states (faces, tools) that differed from the previous trial, relative to those that matched. Effect plotted at P = 0.001 uncorrected for display purposes. (Peak voxel: −48 16 22; P = 1.1 × 10−7, cluster family-wise error corrected for whole-brain comparisons. Cluster size: 833 voxels. Peak t(19) = 6.27. No other clusters survived correction) b. IFG activation correlates with model-free behavior. Individual values reflect average activation of cluster identified from group-level contrast. IFG activation correlates negatively with model-based behavior (estimate = −0.65, χ2(1) = 11.91, P = 0.0006). Lines depict group-level linear effects and 95% confidence curves.
Supplementary Figure 2 Group level depiction of category-specific activation
Group level depiction of category-specific activation used to create functional ROIs from localizer data (ROIs for analysis were created in native space for each subject). Each category ROI constructed from the intersection of contrasts with all other categories (e.g. scenes ROI: scenes > body parts ∩ scenes > faces ∩ scenes > tools), thus preventing any overlap in ROIs (here, the conjunction of these group level contrasts is presented). Each contrast thresholded at P < 0.001, uncorrected. Peaks of clusters surviving family-wise error correction for whole-brain multiple comparisons: body parts: 50 −78 8, t(19)=9.23, cluster P = 2 × 10−6; −48 −76 12, t(19)=6.48, cluster P = 0.008; scenes: −26 −46 −10, t(19) = 11.71, cluster P = 6.8 × 10−5, 24 −34 −16, t(19) = 9.42, P = 1.7 × 10−5,−12 −98 0, t(19) = 8.7, P = 2.8 × 10−9; tools: −8 −78 6, t(19) = 10.22, P = 9.3 × 10−14. No clusters survived correction for the faces category (peak: 34 −90 −12, t(19) = 4.28, P = 0.992).
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1 and 2 and Supplementary Tables 1–4 (PDF 295 kb)
Rights and permissions
About this article
Cite this article
Doll, B., Duncan, K., Simon, D. et al. Model-based choices involve prospective neural activity. Nat Neurosci 18, 767–772 (2015). https://doi.org/10.1038/nn.3981
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nn.3981