Backpropagation | Teaching a Network to See

How Backpropagation Works

Forward Pass

Input features (sweetness, redness) flow left to right through the network. Each neuron computes a weighted sum of its inputs, then squashes it through a sigmoid: \(\sigma(z) = \frac{1}{1+e^{-z}}\). The output neuron's activation is the network's prediction \(\hat{y} \in (0,1)\).

Compute the Loss

We compare the prediction to the true label using Mean Squared Error: \(\mathcal{L} = (\hat{y} - y)^2\). This scalar tells us how wrong the network is. The goal of training is to drive \(\mathcal{L}\) toward zero. The color of the output neuron reflects the error — bright red = high loss.

Backward Pass

The chain rule of calculus lets us compute \(\frac{\partial \mathcal{L}}{\partial w}\) for every weight. The error signal flows backward through the network — watch the red pulse. Each weight is then nudged in the direction that reduces the loss: \(w \leftarrow w - \eta \frac{\partial \mathcal{L}}{\partial w}\).

The Chain Rule in Action

For a weight \(w^{(2)}_{jk}\) in the output layer connecting hidden neuron \(j\) to output neuron \(k\):

\(\frac{\partial \mathcal{L}}{\partial w^{(2)}_{jk}} = \underbrace{(\hat{y}_k - y_k)}_{\text{output error}} \cdot \underbrace{\sigma'(z_k)}_{\text{sigmoid gradient}} \cdot \underbrace{h_j}_{\text{hidden activation}}\)

For a hidden-layer weight \(w^{(1)}_{ij}\), we must chain through the output layer — this is the "back" in backpropagation:

\(\frac{\partial \mathcal{L}}{\partial w^{(1)}_{ij}} = \underbrace{\delta_k \cdot w^{(2)}_{jk}}_{\text{error flowing back}} \cdot \underbrace{\sigma'(z_j)}_{\text{hidden sigmoid gradient}} \cdot \underbrace{x_i}_{\text{input}}\)

🧠 Biological Parallel

While real neurons can't literally send error signals backward, backpropagation captures something profound: a global teaching signal (dopamine-like) that modifies synaptic strengths. The Rescorla-Wagner model and TD learning are single-layer versions of the same principle — the prediction error drives learning.

🍓 The Task

The network sees two features: sweetness (x₁) and redness (x₂), both in [0,1]. Strawberries cluster high-high (🍓), lemons cluster low-low (🍋). The network must learn to separate them. Watch the decision boundary carve out the correct regions as training progresses.

Teaching a Network to See:
Backpropagation

Network Diagram

Last Update (Δw)

Step Log

Decision Boundary