Operant conditioning is learning to make or withhold responses in order to obtain reinforcers or avoid punishers. The organism's behavior operates on the environment to produce consequences.
S-R-O Framework: S (discriminative stimulus — signals whether R will be reinforced) → R (response — the action performed) → O (outcome — the reinforcer or punisher that follows).
CRITICAL DISTINCTION from classical conditioning: In classical conditioning, the outcome follows the S regardless of what the organism does. In operant conditioning, the outcome only follows if R is made — response contingency is the defining feature.
Thorndike's Law of Effect
- A response followed by a satisfying outcome increases in frequency (strengthens the S-R association).
- A response followed by an unsatisfying outcome decreases in frequency (weakens the S-R association).
- Directly parallels the Rescorla-Wagner update rule: satisfying = positive prediction error; unsatisfying = negative prediction error.
Thorndike's Puzzle Box
R = lever/rope sequence to open door
O = escape from box + access to food
| Paradigm | Who Controls Rate | Key Feature |
|---|---|---|
| Thorndike (discrete trial) | Experimenter | Animal placed in box at start of each trial; one trial at a time |
| Skinner (free-operant) | Animal | Animal can press lever at any time; cumulative recorder tracks rate (slope = response rate) |
Shaping: Reinforcing successive approximations to the target behavior, guiding the organism incrementally when it is unlikely to spontaneously produce the desired response. The full target behavior is never required at the outset.
Chaining: Training complex behavioral sequences one link at a time. Backward chaining trains the last step first, so each earlier step becomes a conditioned reinforcer for the step that follows. Used in vocational training, music instruction, and speech therapy.
The 2 × 2 Reinforcement/Punishment Table
| Consequence | Add something (+) | Remove something (−) |
|---|---|---|
| Behavior increases | Positive Reinforcement — add a desirable stimulus → response increases | Negative Reinforcement — remove an aversive stimulus → response increases |
| Behavior decreases | Positive Punishment — add an aversive stimulus → response decreases | Negative Punishment — remove a desirable stimulus → response decreases |
Primary vs. Secondary Reinforcers
| Type | Definition | Properties | Examples |
|---|---|---|---|
| Primary | Biologically significant; reinforcing without prior learning | State-dependent (sated rat won't work for food) | Food, water, sex |
| Secondary (conditioned) | Acquires value via association with primary reinforcer | Deliverable instantly; not state-dependent | Money, grades, clicker, tokens |
| Schedule | Rule | Response Pattern | Real-world Example |
|---|---|---|---|
| FR (Fixed-Ratio) | Reinforcement after every N responses | Steady high rate + post-reinforcement pause | Piecework pay (paid per unit produced) |
| VR (Variable-Ratio) | Reinforcement after N responses on average (N varies) | HIGHEST, most persistent rates; minimal pause | Slot machines, social media "likes" |
| FI (Fixed-Interval) | Reinforcement for first response after fixed time period | Scalloped curve — slow start, accelerating as interval end approaches | Checking the oven at a set timer |
| VI (Variable-Interval) | Reinforcement for first response after variable time period | Steady moderate rates; no post-reinforcement pause | Checking a friend's social post (unpredictable timing) |
Matching Law (Herrnstein, 1961): On concurrent VI schedules (two response options simultaneously available), the relative rate of responding to each option approximately equals the relative rate of reinforcement from each option.
- Bliss point: The allocation of time/effort that maximizes the individual's subjective value — where indifference curves peak.
- Delay discounting: Future rewards are valued less than immediate rewards. Rate of discounting varies across individuals; 12-year-olds discount more steeply than adults.
- Pre-commitment: Deliberately making it harder to access the immediate reward to improve adherence to long-term goals. Examples: gym membership, study groups, blocking distracting websites.
Key structures: Striatum (input nucleus; dorsal striatum = caudate + putamen). GPi/SNr (output; tonically inhibitory — hold thalamus suppressed at rest). SNc/VTA (dopamine sources — SNc projects to dorsal striatum; VTA to ventral striatum/NAc).
Net effect: FACILITATES selected action
Net effect: SUPPRESSES competing actions
Net effect: Rapid broad suppression — halts action commitment before full deliberation
| Dopamine's Action | Receptor | Pathway | Net Behavioral Effect |
|---|---|---|---|
| Excites D1 neurons | D1 (direct) | Go | Promotes execution of selected action |
| Inhibits D2 neurons | D2 (indirect) | No-Go | Suppresses competing actions |
| Structure | Learning Type | What It Encodes | Lesion Effect |
|---|---|---|---|
| Dorsal Striatum | S-R learning (habitual, cue-guided) | Associates discriminative stimuli with appropriate responses; forms automatic habits | Impairs responding when discriminative cue must guide response; simple R-O survives |
| Orbitofrontal Cortex (OFC) | R-O learning (goal-directed, outcome identity) | Links responses to expected identity of the outcome (what reward, not just whether reward) | Impairs reversal learning (perseverate on previously rewarded option); steepens delay discounting |
Schoenbaum et al.: OFC neurons fire during the delay between response and outcome, coding specifically for expected outcome identity (sucrose vs. quinine). "Error trial" cells fire strongly when an aversive quinine outcome is predicted — they represent anticipated hedonic identity, not just valence.
Tremblay & Schultz monkeys: OFC neurons respond selectively to pictures predicting grape juice but NOT orange juice — even though both are positive rewards. This confirms OFC encodes which reward, not merely reward vs. no reward.
OFC = R-O (goal-directed, outcome identity). Damage → respond to old rules even after reversal; steeper delay discounting.
Both are needed for full operant behavior.
Intracranial self-stimulation (Olds, 1955): Rats press a lever thousands of times per hour for stimulation of VTA/lateral hypothalamus. They prefer it to food. Established dopamine neurons as powerful drivers of behavior.
| Component | Label | Neurotransmitter System | Experimental Evidence |
|---|---|---|---|
| WANTING | Incentive salience — motivation to seek and work for reward | Dopamine | Pimozide (dopamine blocker) → rats stop lever-pressing (wanting abolished) even though food still delivered |
| LIKING | Hedonic value — pleasure from consuming reward | Endogenous opioids (enkephalins, endorphins) | Dopamine-depleted rats show intact hedonic facial reactions (tongue protrusion to sweet, gapes to bitter) |
- Sucrose vs. chow: dopamine-antagonized rats settle for free chow but won't press a lever for sucrose. If both placed freely in front of the rat, it still prefers sucrose (liking intact) — it just won't work for it.
- Hedonic 'yum' facial reactions (tongue protrusion = sweet; gapes = bitter) are phylogenetically conserved and preserved in dopamine-depleted animals.
- Morphine → makes sweet food taste sweeter (opioid ↑ liking). Naloxone → reduces sweet preference (opioid blockade ↓ liking).
PIT definition: A Pavlovian CS modulates the vigor or rate of an ongoing instrumental response without the CS being contingent on that response. PIT bridges classical and instrumental conditioning.
Three-Phase Procedure
- Phase 1 — Pavlovian training: CS+ → Outcome (no response required from the animal).
- Phase 2 — Instrumental training: Lever → Same or different Outcome (no CS present).
- Transfer test: Lever available + CS+ played, but no outcomes delivered. Does CS+ increase lever-pressing rate? (Yes = PIT demonstrated.)
| PIT Type | Effect | Neural Substrate |
|---|---|---|
| Specific PIT | CS modulates ONLY the response linked to the same outcome — outcome-specific potentiation | Basolateral amygdala (BLA) → Nucleus accumbens shell |
| General PIT | CS increases ALL responses non-selectively — general motivational boost | Central amygdala (CeA) → Nucleus accumbens core via VTA |
Pathological addiction is defined by compulsive drug-seeking maintained despite known harmful consequences. It is maintained by both positive reinforcement (the high) and negative reinforcement (relief from withdrawal).
Long-term use dissociation: Chronic users often report diminished liking (no longer get the same high — opioid system down-regulated) but intensified wanting (craving is stronger — dopamine wanting system sensitized). This is the incentive salience model of addiction.
| Drug | Mechanism | Effect on Dopamine |
|---|---|---|
| Cocaine | DAT blocker — prevents dopamine reuptake | Dopamine lingers longer in synapse |
| Amphetamine | Triggers dopamine release from vesicles | More dopamine released into synapse |
Conditioning-Based Treatments
- Distancing (stimulus control): Avoid discriminative stimuli that trigger drug-seeking. Reduce Pavlovian CS exposure to prevent PIT-driven relapse.
- Contingency management: Reinforce incompatible behaviors — money vouchers for heroin-free urine samples.
- Imposed delay: Insert a delay between craving and drug access, weakening the R-O association over time.
- Naltrexone: Opioid receptor antagonist — reduces hedonic "liking" component, making consumption less pleasurable.
- Behavioral addictions (gambling, gaming): Same dopaminergic circuits as drug addiction. Gambling = VR schedule — most persistent and hardest to extinguish.
Key Terms — 28 Flashcards
Click any card to reveal its definition. Use the filters to focus on a category.
No-Go (indirect, D2): Dopamine inhibits D2 striatum → GPe → STN → GPi → suppresses competing actions.
Practice Multiple Choice — 20 Questions
Click an option to check your answer. Explanations appear automatically.
Big Picture Synthesis
How the week's concepts connect across levels of analysis and to the course arc.
Levels of Analysis
- Identify the reinforcement schedule from a scenario; predict response pattern and persistence
- Distinguish negative reinforcement from punishment — most common error on this topic
- Apply wanting/liking dissociation: which neurotransmitter, which experimental evidence
- Trace the Go/No-Go basal ganglia pathways with receptor types and output nuclei
- Contrast dorsal striatum (S-R) vs. OFC (R-O) and predict lesion effects
- Describe the three-phase PIT procedure and distinguish specific vs. general PIT neural substrates
- Explain addiction as a wanting/liking dissociation with cocaine/amphetamine mechanisms
- Dopamine PE signal (Weeks 5–7) becomes the motivational wanting signal here
- Basal ganglia direct/indirect → Module 10 (skill memory, actor-critic framework)
- PIT bridges classical (Weeks 6–7) and instrumental conditioning
- Delay discounting connects to economic decision-making frameworks
- Matching law = behavioral-level implementation of value-based choice
- OFC outcome representation → model-based RL: knowing expected outcomes enables flexible planning