Learning Value Through Experience:
The Rescorla-Wagner Model

How does the brain learn the value of things? Watch an agent discover a restaurant's true quality through noisy feedback, one meal at a time.

Parameters
/10

The actual quality of the restaurant (unknown to the learner)

How much weight to give each new experience

Variability in the feedback (e.g., chef's inconsistency)

/10
🍽️

Chez Apprentissage

How good is this restaurant, really?

5.0 /10

Your current estimate (the learner's belief)

Visits 0
Try the restaurant to receive feedback...

Learning Curve

True Value
Estimate
Feedback

Prediction Error \(\delta_t = R_t - V_t\)

Positive (better than expected)
Negative (worse than expected)
Current Estimate
5.00
Absolute Error
4.00
Avg Prediction Error
Last PE \(\delta\)

The Problem: Learning from Noisy Experience

Imagine you've moved to a new city and you're trying to figure out if the restaurant on the corner is any good. You can't know the "true quality" directly—all you can do is eat there and see how the meal turns out. Sometimes the chef has a great day; sometimes they don't. Each meal gives you a noisy signal of the underlying quality.

How should you update your beliefs? The Rescorla-Wagner model (1972) provides an elegant answer that turns out to be deeply connected to how dopamine neurons in the brain actually work.

The Update Rule

The model is beautifully simple. After each experience, you update your estimate using a single equation:

The Rescorla-Wagner Update

$$V_{t+1} = V_t + \alpha \cdot \underbrace{(R_t - V_t)}_{\text{prediction error } \delta_t}$$
  • \(V_t\) — your current estimated value (before this trial)
  • \(R_t\) — the feedback you received this trial (true value + noise)
  • \(\alpha\) — the learning rate, how much you adjust per trial (0 to 1)
  • \(\delta_t = R_t - V_t\) — the prediction error, the surprise

The key insight is the prediction error \(\delta_t\). If the meal was better than expected (\(\delta > 0\)), you revise your estimate upward. If it was worse (\(\delta < 0\)), you revise downward. If it matches your expectation perfectly (\(\delta = 0\)), you don't change your estimate at all. Learning stops when there is nothing left to be surprised about.

The Learning Rate \(\alpha\)

The learning rate controls the speed-stability tradeoff:

Try it in the simulator above! Set \(\alpha = 0.9\) and watch the estimate bounce around. Then try \(\alpha = 0.05\) and see how it glides smoothly toward the truth.

Feedback Noise \(\sigma\)

Each time you visit, the feedback you receive is:

$$R_t = V^* + \varepsilon_t, \quad \varepsilon_t \sim \mathcal{N}(0, \sigma^2)$$

With low noise, every meal closely reflects the true quality, and learning is easy. With high noise, meals vary wildly—sometimes a 10/10 experience, sometimes a 3/10—making it harder to pin down the true value.

Connection to Dopamine

In 1997, Schultz, Dayan, and Montague made a landmark discovery: dopamine neurons in the midbrain fire in a pattern that looks exactly like prediction error. When a reward is better than expected, dopamine neurons burst. When it's worse, they pause. When a reward is fully predicted, they don't respond at all.

This means the Rescorla-Wagner model isn't just a mathematical convenience—it describes the actual learning algorithm implemented by the brain's reward system.

Try These Experiments

🧪 Suggested Explorations:

  1. Set \(\alpha = 0.1\) and \(\sigma = 1\). Run 50 trials. How close does the estimate get? Now reset and try \(\alpha = 0.5\). What changes?
  2. Keep \(\alpha = 0.2\), but increase noise to \(\sigma = 4\). How does high noise affect the learning curve?
  3. Set \(\alpha = 1.0\). What happens? Why is this a problem?
  4. Watch the prediction error chart as learning progresses. Why do the bars get smaller over time?
  5. Change the true value mid-experiment. Does the learner adapt? How does \(\alpha\) affect re-learning?