Nutrient-sensitive reinforcement learning in monkeys.
Huang 黃飛揚 F-Y., Grabenhorst F.
In Reinforcement Learning (RL), animals choose by assigning values to options and learn by updating these values from reward outcomes. This framework has been instrumental in identifying fundamental learning variables and their neuronal implementations. However, canonical RL models do not explain how reward values are constructed from biologically critical intrinsic reward components, such as nutrients. From an ecological perspective, animals should adapt their foraging choices in dynamic environments to acquire nutrients that are essential for survival. Here, to advance the biological and ecological validity of RL models, we investigated how (male) monkeys adapt their choices to obtain preferred nutrient rewards under varying reward probabilities. We found that the rewards' nutrient composition strongly influenced learning and choices. The animals' preferences for specific nutrients (sugar, fat) affected how they adapted to changing reward probabilities: the history of recent rewards influenced monkeys' choices more strongly if these rewards contained the monkey's preferred nutrients ('nutrient-specific reward history'). The monkeys also chose preferred nutrients even when they were associated with lower reward probability. A nutrient-sensitive RL model captured these processes: it updated the values of individual sugar and fat components of expected rewards based on experience and integrated them into subjective values that explained the monkeys' choices. Nutrient-specific reward prediction errors guided this value-updating process. Our results identify nutrients as important reward components that guide learning and choice by influencing the subjective value of choice options. Extending RL models with nutrient-value functions may enhance their biological validity and uncover nutrient-specific learning and decision variables.SIGNIFICANCE STATEMENT:Reinforcement learning (RL) is an influential framework that formalizes how animals learn from experienced rewards. Although 'reward' is a foundational concept in RL theory, canonical RL models cannot explain how learning depends on specific reward properties, such as nutrients. Intuitively, learning should be sensitive to the reward's nutrient components, to benefit health and survival. Here we show that the nutrient (fat, sugar) composition of rewards affects monkeys' choices and learning in an RL paradigm, and that key learning variables including 'reward history' and 'reward prediction error' should be modified with nutrient-specific components to account for monkeys' behavior in our task. By incorporating biologically critical nutrient rewards into the RL framework our findings help advance the ecological validity of RL models.