Cookies on this website
We use cookies to ensure that we give you the best experience on our website. If you click 'Continue' we'll assume that you are happy to receive all cookies and you won't see this message again. Click 'Find out more' for information on how to change your cookie settings.

Mammalian decision making is thought to utilise at least two distinct computational strategies, model-free and model-based reinforcement learning (RL), instantiated in partially separate neural circuits. The former learns values of states and actions directly through reward prediction errors while the latter utilises a forward model which predicts future state given chosen action and supports computations akin to search in a decision tree. A challenge in studying these systems is developing tasks in which their contribution to behaviour can be differentiated in the context of large decision datasets well suited to neurophysiology. The two-step human decision making task (Daw et al. 2011) is an influential recent approach to addressing this problem. The task utilises a choice between two actions, which lead probabilistically to one of two states, in which further actions lead probabilistically to reward. Model-based control – which understands the mapping between the actions and states, gives different recommendations for behaviour from model-free control, particularly following uncommon state transitions. I will discuss work adapting the two-step task for use with rodents, and present data from optogenetic silencing of anterior cingulate cortex in the task. Mice readily learn a simplified, poke based adaptation of the task, but is their apparently strongly model-based behaviour really due to forward planning? We think that with extensive training subjects in fact learn to exploit correlations between where rewards are obtained and the correct choice at the first step, developing sophisticated habitual strategies that superficially look like planning (Akam, Costa, Dayan 2015). We therefore developed a version of the task in which both reward and state-transition probabilities change over time, removing the correlations underpinning such strategies. Mice successfully learn this modified task and show behaviour consistent with a mixture of model-based and model-free control. Optogenetic silencing of neurons in anterior cingulate cortex on a subset of trials produced a selective deficit in processing the state transition without affecting processing of the trial outcome (rewarded or not), consistent with a role for anterior cingulate cortex in model-based control.