Teaching a Network to See:
Backpropagation

A neural network learns to classify fruit by sweetness and color — watch the error ripple backward through the network, nudging each weight toward a better answer.

Parameters

How big each gradient step is

4

Network capacity (requires reset)

Current Sample

Sweetness (x₁)
Redness (x₂)
True Label
Network Output
Loss (MSE)
Training Steps 0
Idle

Click on the decision boundary canvas to add training points

Network Diagram

positive weight negative weight

Last Update (Δw)

Run a step to see weight updates...

Step Log

Steps will appear here...

Decision Boundary

🍓 Strawberry 🍋 Lemon

Click to add a strawberry (high sweet + red) or lemon point (low sweet, yellow). Left-click = 🍓, Right-click = 🍋

Loss Over Time

Mean Squared Error

How Backpropagation Works

1

Forward Pass

Input features (sweetness, redness) flow left to right through the network. Each neuron computes a weighted sum of its inputs, then squashes it through a sigmoid: \(\sigma(z) = \frac{1}{1+e^{-z}}\). The output neuron's activation is the network's prediction \(\hat{y} \in (0,1)\).

2

Compute the Loss

We compare the prediction to the true label using Mean Squared Error: \(\mathcal{L} = (\hat{y} - y)^2\). This scalar tells us how wrong the network is. The goal of training is to drive \(\mathcal{L}\) toward zero. The color of the output neuron reflects the error — bright red = high loss.

3

Backward Pass

The chain rule of calculus lets us compute \(\frac{\partial \mathcal{L}}{\partial w}\) for every weight. The error signal flows backward through the network — watch the red pulse. Each weight is then nudged in the direction that reduces the loss: \(w \leftarrow w - \eta \frac{\partial \mathcal{L}}{\partial w}\).

The Chain Rule in Action

For a weight \(w^{(2)}_{jk}\) in the output layer connecting hidden neuron \(j\) to output neuron \(k\):

\(\frac{\partial \mathcal{L}}{\partial w^{(2)}_{jk}} = \underbrace{(\hat{y}_k - y_k)}_{\text{output error}} \cdot \underbrace{\sigma'(z_k)}_{\text{sigmoid gradient}} \cdot \underbrace{h_j}_{\text{hidden activation}}\)

For a hidden-layer weight \(w^{(1)}_{ij}\), we must chain through the output layer — this is the "back" in backpropagation:

\(\frac{\partial \mathcal{L}}{\partial w^{(1)}_{ij}} = \underbrace{\delta_k \cdot w^{(2)}_{jk}}_{\text{error flowing back}} \cdot \underbrace{\sigma'(z_j)}_{\text{hidden sigmoid gradient}} \cdot \underbrace{x_i}_{\text{input}}\)

🧠 Biological Parallel

While real neurons can't literally send error signals backward, backpropagation captures something profound: a global teaching signal (dopamine-like) that modifies synaptic strengths. The Rescorla-Wagner model and TD learning are single-layer versions of the same principle — the prediction error drives learning.

🍓 The Task

The network sees two features: sweetness (x₁) and redness (x₂), both in [0,1]. Strawberries cluster high-high (🍓), lemons cluster low-low (🍋). The network must learn to separate them. Watch the decision boundary carve out the correct regions as training progresses.