Watch prediction errors shift from the reward to the cue—the signature of dopamine neuron activity (Schultz, Dayan & Montague, 1997).
Set < 1 to see omission dips
Run trials to see the prediction error shift backward from the US (reward) to the CS (cue)—exactly like dopamine neuron recordings.
Dopamine neurons fire when something surprising happens. Schultz (1997) showed that early in conditioning, dopamine fires at reward delivery. After learning, dopamine instead fires at the CS (cue) that predicts the reward—and no longer at the reward itself. TD learning explains this.
Time is divided into small steps (microstates). At each timestep \(t\), the agent computes a prediction error:
Then update the value at \(t\):
Over trials, the prediction error shifts backward:
This is exactly what Schultz observed in dopamine neurons: early in learning, dopamine fires at reward delivery; after learning, it fires at the predictive cue.
1. Run 50 trials and watch the green peak shift from US to CS in the heatmap.
2. Set reward probability to 0.5 — red bands appear when expected rewards are omitted.
3. Try γ = 0.8 vs γ = 0.98 — lower discount slows the backward propagation.