This repository contains materials for modeling choice data recorded while participants played a two-armed bandit task. Three reinforcement learning (RL) models are implemented along with two heuristic models for comparison. RL models are all variations of temporal difference learning models.
These models were coded in the spring of 2022 by Kaustubh Kulkarni (reach me via email, GitHub, or Twitter with any questions).
Model | Model Description | Parameters | Details |
---|---|---|---|
Biased | Preference for one machine | bias | The bias parameter fits a preference for one machine over the other. |
Heuristic | Switch to other machine after two losses | epsilon | The epsilon parameter fits random choice that does not adhere to the switching strategy. |
RW | Temporal difference learning (TDL) | alpha, beta | Alpha refers to the learning rate, beta refers to the inverse temperature parameter. |
RWDecay | TDL with center decay | alpha, decay, beta | Alpha and beta are the same as above. The decay parameter fits the speed by which the values move towards a neutral value. |
RWRL | TDL with separate learning rates | alpha_pos, alpha_neg, beta | Alpha_pos and alpha_neg refer to learning rates for positive and negative prediction errors respectively. |
I am grateful to Dr. Xiaosi Gu, Dr. Daniela Schiller, and the Center for Computational Psychiatry at Mount Sinai. I am also grateful to Project Jupyter for making it possible to create and share these materials in a Jupyter Book.
Content in this repository (i.e., any .md or .ipynb files in the content/ folder) is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.