The Problem: Learning from Noisy Experience
Imagine you've moved to a new city and you're trying to figure out if the restaurant on the corner is any good. You can't know the "true quality" directly—all you can do is eat there and see how the meal turns out. Sometimes the chef has a great day; sometimes they don't. Each meal gives you a noisy signal of the underlying quality.
How should you update your beliefs? The Rescorla-Wagner model (1972) provides an elegant answer that turns out to be deeply connected to how dopamine neurons in the brain actually work.
The Update Rule
The model is beautifully simple. After each experience, you update your estimate using a single equation:
The Rescorla-Wagner Update
- \(V_t\) — your current estimated value (before this trial)
- \(R_t\) — the feedback you received this trial (true value + noise)
- \(\alpha\) — the learning rate, how much you adjust per trial (0 to 1)
- \(\delta_t = R_t - V_t\) — the prediction error, the surprise
The key insight is the prediction error \(\delta_t\). If the meal was better than expected (\(\delta > 0\)), you revise your estimate upward. If it was worse (\(\delta < 0\)), you revise downward. If it matches your expectation perfectly (\(\delta = 0\)), you don't change your estimate at all. Learning stops when there is nothing left to be surprised about.
The Learning Rate \(\alpha\)
The learning rate controls the speed-stability tradeoff:
- High \(\alpha\) (e.g., 0.8): You learn fast, but your estimate is volatile—each new experience swings your belief dramatically. Your estimate chases the noise.
- Low \(\alpha\) (e.g., 0.05): You learn slowly, but your estimate is stable and smooth. It takes many experiences to converge, but the noise is averaged out.
Try it in the simulator above! Set \(\alpha = 0.9\) and watch the estimate bounce around. Then try \(\alpha = 0.05\) and see how it glides smoothly toward the truth.
Feedback Noise \(\sigma\)
Each time you visit, the feedback you receive is:
With low noise, every meal closely reflects the true quality, and learning is easy. With high noise, meals vary wildly—sometimes a 10/10 experience, sometimes a 3/10—making it harder to pin down the true value.
Connection to Dopamine
In 1997, Schultz, Dayan, and Montague made a landmark discovery: dopamine neurons in the midbrain fire in a pattern that looks exactly like prediction error. When a reward is better than expected, dopamine neurons burst. When it's worse, they pause. When a reward is fully predicted, they don't respond at all.
This means the Rescorla-Wagner model isn't just a mathematical convenience—it describes the actual learning algorithm implemented by the brain's reward system.
Try These Experiments
🧪 Suggested Explorations:
- Set \(\alpha = 0.1\) and \(\sigma = 1\). Run 50 trials. How close does the estimate get? Now reset and try \(\alpha = 0.5\). What changes?
- Keep \(\alpha = 0.2\), but increase noise to \(\sigma = 4\). How does high noise affect the learning curve?
- Set \(\alpha = 1.0\). What happens? Why is this a problem?
- Watch the prediction error chart as learning progresses. Why do the bars get smaller over time?
- Change the true value mid-experiment. Does the learner adapt? How does \(\alpha\) affect re-learning?