A neural network learns to classify fruit by sweetness and color — watch the error ripple backward through the network, nudging each weight toward a better answer.
How big each gradient step is
Network capacity (requires reset)
Current Sample
Click on the decision boundary canvas to add training points
Run a step to see weight updates...
Steps will appear here...
Click to add a strawberry (high sweet + red) or lemon point (low sweet, yellow). Left-click = 🍓, Right-click = 🍋
Input features (sweetness, redness) flow left to right through the network. Each neuron computes a weighted sum of its inputs, then squashes it through a sigmoid: \(\sigma(z) = \frac{1}{1+e^{-z}}\). The output neuron's activation is the network's prediction \(\hat{y} \in (0,1)\).
We compare the prediction to the true label using Mean Squared Error: \(\mathcal{L} = (\hat{y} - y)^2\). This scalar tells us how wrong the network is. The goal of training is to drive \(\mathcal{L}\) toward zero. The color of the output neuron reflects the error — bright red = high loss.
The chain rule of calculus lets us compute \(\frac{\partial \mathcal{L}}{\partial w}\) for every weight. The error signal flows backward through the network — watch the red pulse. Each weight is then nudged in the direction that reduces the loss: \(w \leftarrow w - \eta \frac{\partial \mathcal{L}}{\partial w}\).
For a weight \(w^{(2)}_{jk}\) in the output layer connecting hidden neuron \(j\) to output neuron \(k\):
\(\frac{\partial \mathcal{L}}{\partial w^{(2)}_{jk}} = \underbrace{(\hat{y}_k - y_k)}_{\text{output error}} \cdot \underbrace{\sigma'(z_k)}_{\text{sigmoid gradient}} \cdot \underbrace{h_j}_{\text{hidden activation}}\)
For a hidden-layer weight \(w^{(1)}_{ij}\), we must chain through the output layer — this is the "back" in backpropagation:
\(\frac{\partial \mathcal{L}}{\partial w^{(1)}_{ij}} = \underbrace{\delta_k \cdot w^{(2)}_{jk}}_{\text{error flowing back}} \cdot \underbrace{\sigma'(z_j)}_{\text{hidden sigmoid gradient}} \cdot \underbrace{x_i}_{\text{input}}\)
While real neurons can't literally send error signals backward, backpropagation captures something profound: a global teaching signal (dopamine-like) that modifies synaptic strengths. The Rescorla-Wagner model and TD learning are single-layer versions of the same principle — the prediction error drives learning.
The network sees two features: sweetness (x₁) and redness (x₂), both in [0,1]. Strawberries cluster high-high (🍓), lemons cluster low-low (🍋). The network must learn to separate them. Watch the decision boundary carve out the correct regions as training progresses.