Module 8 Exam Prep — Operant Conditioning

📦

Section 1

What is Operant Conditioning?

Operant conditioning is learning to make or withhold responses in order to obtain reinforcers or avoid punishers. The organism's behavior operates on the environment to produce consequences.

S-R-O Framework: S (discriminative stimulus — signals whether R will be reinforced) → R (response — the action performed) → O (outcome — the reinforcer or punisher that follows).

CRITICAL DISTINCTION from classical conditioning: In classical conditioning, the outcome follows the S regardless of what the organism does. In operant conditioning, the outcome only follows if R is made — response contingency is the defining feature.

⚠ Key Distinction

Classical: O follows S regardless of R. | Operant: O follows ONLY if R occurs. Confusing these on an exam is a critical error.

Thorndike's Law of Effect

A response followed by a satisfying outcome increases in frequency (strengthens the S-R association).
A response followed by an unsatisfying outcome decreases in frequency (weakens the S-R association).
Directly parallels the Rescorla-Wagner update rule: satisfying = positive prediction error; unsatisfying = negative prediction error.

Thorndike's Puzzle Box

S-R-O Mapping

S = puzzle box (context/discriminative stimulus)
R = lever/rope sequence to open door
O = escape from box + access to food

Paradigm	Who Controls Rate	Key Feature
Thorndike (discrete trial)	Experimenter	Animal placed in box at start of each trial; one trial at a time
Skinner (free-operant)	Animal	Animal can press lever at any time; cumulative recorder tracks rate (slope = response rate)

🎯

Section 2

Shaping, Chaining & Reinforcers

Shaping: Reinforcing successive approximations to the target behavior, guiding the organism incrementally when it is unlikely to spontaneously produce the desired response. The full target behavior is never required at the outset.

Chaining: Training complex behavioral sequences one link at a time. Backward chaining trains the last step first, so each earlier step becomes a conditioned reinforcer for the step that follows. Used in vocational training, music instruction, and speech therapy.

The 2 × 2 Reinforcement/Punishment Table

Consequence	Add something (+)	Remove something (−)
Behavior increases	Positive Reinforcement — add a desirable stimulus → response increases	Negative Reinforcement — remove an aversive stimulus → response increases
Behavior decreases	Positive Punishment — add an aversive stimulus → response decreases	Negative Punishment — remove a desirable stimulus → response decreases

⚠ Most Common Error on This Topic

Negative reinforcement is NOT punishment. Reinforcement ALWAYS increases behavior. The word "negative" refers only to removing something — not to a bad outcome. Escaping from pain (removing aversive) = negative reinforcement = behavior increases.

Primary vs. Secondary Reinforcers

Type	Definition	Properties	Examples
Primary	Biologically significant; reinforcing without prior learning	State-dependent (sated rat won't work for food)	Food, water, sex
Secondary (conditioned)	Acquires value via association with primary reinforcer	Deliverable instantly; not state-dependent	Money, grades, clicker, tokens

Token Economies

Secondary reinforcement applied at scale: tokens earned for target behaviors, exchanged later for primary reinforcers. Widely used in psychiatric hospitals and ASD classrooms.

⏱️

Section 3

Reinforcement Schedules

Schedule	Rule	Response Pattern	Real-world Example
FR (Fixed-Ratio)	Reinforcement after every N responses	Steady high rate + post-reinforcement pause	Piecework pay (paid per unit produced)
VR (Variable-Ratio)	Reinforcement after N responses on average (N varies)	HIGHEST, most persistent rates; minimal pause	Slot machines, social media "likes"
FI (Fixed-Interval)	Reinforcement for first response after fixed time period	Scalloped curve — slow start, accelerating as interval end approaches	Checking the oven at a set timer
VI (Variable-Interval)	Reinforcement for first response after variable time period	Steady moderate rates; no post-reinforcement pause	Checking a friend's social post (unpredictable timing)

Key Principle — Why Variable Schedules Are So Compelling

Variable schedules (VR and VI) eliminate the post-reinforcement pause because reinforcement could arrive at any moment. This produces the most persistent behavior and the hardest-to-extinguish responding. Slot machines use VR — this is why gambling is so addictive. Extinction after VR is far slower than after FR.

Scalloped FI Curve — Know This

On FI schedules, animals learn to time the interval. They show little responding right after reinforcement (the animal "knows" it's too early), then accelerate as the interval end approaches. The cumulative record forms a characteristic scalloped shape.

📊

Section 4

Matching Law & Behavioral Economics

Matching Law (Herrnstein, 1961): On concurrent VI schedules (two response options simultaneously available), the relative rate of responding to each option approximately equals the relative rate of reinforcement from each option.

Herrnstein's Matching Equation

Rate(A) / Rate(B) = Reinforcement(A) / Reinforcement(B)

Bliss point: The allocation of time/effort that maximizes the individual's subjective value — where indifference curves peak.
Delay discounting: Future rewards are valued less than immediate rewards. Rate of discounting varies across individuals; 12-year-olds discount more steeply than adults.
Pre-commitment: Deliberately making it harder to access the immediate reward to improve adherence to long-term goals. Examples: gym membership, study groups, blocking distracting websites.

Connection to Behavioral Economics

Delay discounting is the behavioral mechanism behind impulsivity and self-control failures. Interventions that impose a delay between craving and consumption weaken the R-O association and reduce relapse.

🔄

Section 5

Basal Ganglia: Direct & Indirect Pathways

Key structures: Striatum (input nucleus; dorsal striatum = caudate + putamen). GPi/SNr (output; tonically inhibitory — hold thalamus suppressed at rest). SNc/VTA (dopamine sources — SNc projects to dorsal striatum; VTA to ventral striatum/NAc).

Direct Pathway ("Go" — D1 receptors)

Cortex → D1-Striatum (EXCITED by dopamine) → GPi inhibited (GABA) → Thalamus disinhibited ↑ → Cortex activated
Net effect: FACILITATES selected action

Indirect Pathway ("No-Go" — D2 receptors)

Cortex → D2-Striatum (INHIBITED by dopamine) → GPe inhibited → STN disinhibited → GPi excited (glutamate) → Thalamus suppressed ↓
Net effect: SUPPRESSES competing actions

Hyper-direct Pathway ("Pause")

Cortex → STN → GPi directly
Net effect: Rapid broad suppression — halts action commitment before full deliberation

Dopamine's Action	Receptor	Pathway	Net Behavioral Effect
Excites D1 neurons	D1 (direct)	Go	Promotes execution of selected action
Inhibits D2 neurons	D2 (indirect)	No-Go	Suppresses competing actions

Exam Tip — Dopamine's Dual Role

Dopamine's power comes from acting on both pathways simultaneously: it promotes Go AND suppresses No-Go. Together this tips the balance decisively toward the rewarded action. Parkinson's disease (dopamine depletion) impairs movement initiation (Go) AND increases suppression of competing actions (No-Go stuck "on").

🧠

Section 6

Dorsal Striatum vs. Orbitofrontal Cortex

Structure	Learning Type	What It Encodes	Lesion Effect
Dorsal Striatum	S-R learning (habitual, cue-guided)	Associates discriminative stimuli with appropriate responses; forms automatic habits	Impairs responding when discriminative cue must guide response; simple R-O survives
Orbitofrontal Cortex (OFC)	R-O learning (goal-directed, outcome identity)	Links responses to expected identity of the outcome (what reward, not just whether reward)	Impairs reversal learning (perseverate on previously rewarded option); steepens delay discounting

Schoenbaum et al.: OFC neurons fire during the delay between response and outcome, coding specifically for expected outcome identity (sucrose vs. quinine). "Error trial" cells fire strongly when an aversive quinine outcome is predicted — they represent anticipated hedonic identity, not just valence.

Tremblay & Schultz monkeys: OFC neurons respond selectively to pictures predicting grape juice but NOT orange juice — even though both are positive rewards. This confirms OFC encodes which reward, not merely reward vs. no reward.

KEY CONTRAST — Striatum vs. OFC

Dorsal striatum = S-R (habitual, cue-guided). Damage → can't use discriminative stimulus to guide response.
OFC = R-O (goal-directed, outcome identity). Damage → respond to old rules even after reversal; steeper delay discounting.
Both are needed for full operant behavior.

💡

Section 7

Wanting vs. Liking: Berridge's Dissociation

Intracranial self-stimulation (Olds, 1955): Rats press a lever thousands of times per hour for stimulation of VTA/lateral hypothalamus. They prefer it to food. Established dopamine neurons as powerful drivers of behavior.

Component	Label	Neurotransmitter System	Experimental Evidence
WANTING	Incentive salience — motivation to seek and work for reward	Dopamine	Pimozide (dopamine blocker) → rats stop lever-pressing (wanting abolished) even though food still delivered
LIKING	Hedonic value — pleasure from consuming reward	Endogenous opioids (enkephalins, endorphins)	Dopamine-depleted rats show intact hedonic facial reactions (tongue protrusion to sweet, gapes to bitter)

⚠ Critical Concept — Extinction Mimicry

Pimozide-treated rats stopped lever-pressing (looked like extinction) even though food was still delivered. Critical test: food placed directly in mouth → normal hedonic "yum" reactions (tongue protrusion). Conclusion: dopamine drives the motivation to work (wanting), NOT the pleasure of consumption (liking). Blocking dopamine abolished wanting while leaving liking intact.

Sucrose vs. chow: dopamine-antagonized rats settle for free chow but won't press a lever for sucrose. If both placed freely in front of the rat, it still prefers sucrose (liking intact) — it just won't work for it.
Hedonic 'yum' facial reactions (tongue protrusion = sweet; gapes = bitter) are phylogenetically conserved and preserved in dopamine-depleted animals.
Morphine → makes sweet food taste sweeter (opioid ↑ liking). Naloxone → reduces sweet preference (opioid blockade ↓ liking).

⚠ Misconception — "Dopamine is the pleasure molecule"

This is WRONG. Dopamine = WANTING (motivation/incentive salience). Opioids = LIKING (hedonic pleasure). Blocking dopamine leaves hedonic reactions completely intact. These are separable systems — you can want without liking and like without wanting.

🔗

Section 8

Pavlovian-to-Instrumental Transfer (PIT)

PIT definition: A Pavlovian CS modulates the vigor or rate of an ongoing instrumental response without the CS being contingent on that response. PIT bridges classical and instrumental conditioning.

Three-Phase Procedure

Phase 1 — Pavlovian training: CS+ → Outcome (no response required from the animal).
Phase 2 — Instrumental training: Lever → Same or different Outcome (no CS present).
Transfer test: Lever available + CS+ played, but no outcomes delivered. Does CS+ increase lever-pressing rate? (Yes = PIT demonstrated.)

PIT Type	Effect	Neural Substrate
Specific PIT	CS modulates ONLY the response linked to the same outcome — outcome-specific potentiation	Basolateral amygdala (BLA) → Nucleus accumbens shell
General PIT	CS increases ALL responses non-selectively — general motivational boost	Central amygdala (CeA) → Nucleus accumbens core via VTA

Clinical Example — Cue-Induced Craving & Relapse

An ex-smoker walks into a bar where they used to smoke. The bar is a Pavlovian CS associated with the nicotine high. Via PIT, it potentiates the instrumental response (reaching for a pocket) even though quitting has removed any cigarettes. This is PIT driving cue-induced craving and relapse.

⚠ PIT vs. Discriminative Stimulus Control

These are NOT the same. A discriminative stimulus is part of the instrumental contingency — it signals that R will be reinforced. A PIT CS is trained in a separate Pavlovian phase with no response required. They work through different neural circuits.

🏥

Section 9

Addiction & Clinical Perspectives

Pathological addiction is defined by compulsive drug-seeking maintained despite known harmful consequences. It is maintained by both positive reinforcement (the high) and negative reinforcement (relief from withdrawal).

Long-term use dissociation: Chronic users often report diminished liking (no longer get the same high — opioid system down-regulated) but intensified wanting (craving is stronger — dopamine wanting system sensitized). This is the incentive salience model of addiction.

Drug	Mechanism	Effect on Dopamine
Cocaine	DAT blocker — prevents dopamine reuptake	Dopamine lingers longer in synapse
Amphetamine	Triggers dopamine release from vesicles	More dopamine released into synapse

Conditioning-Based Treatments

Distancing (stimulus control): Avoid discriminative stimuli that trigger drug-seeking. Reduce Pavlovian CS exposure to prevent PIT-driven relapse.
Contingency management: Reinforce incompatible behaviors — money vouchers for heroin-free urine samples.
Imposed delay: Insert a delay between craving and drug access, weakening the R-O association over time.
Naltrexone: Opioid receptor antagonist — reduces hedonic "liking" component, making consumption less pleasurable.
Behavioral addictions (gambling, gaming): Same dopaminergic circuits as drug addiction. Gambling = VR schedule — most persistent and hardest to extinguish.

Key Terms — 28 Flashcards

Click any card to reveal its definition. Use the filters to focus on a category.

Behavioral

Operant Conditioning

Click to reveal →

Behavioral

Learning to make or withhold responses to obtain reinforcers or avoid punishers. The outcome only follows if the organism performs the response (response contingency).

Behavioral

Law of Effect (Thorndike)

Click to reveal →

Behavioral

Response followed by a satisfying outcome increases in frequency (strengthens S-R). Response followed by an unsatisfying outcome decreases. Foundation of operant learning; parallels R-W update rule.

Behavioral

Shaping

Click to reveal →

Behavioral

Reinforcing successive approximations to a target behavior. Used when organism unlikely to spontaneously emit the desired response — guides it incrementally.

Behavioral

Chaining

Click to reveal →

Behavioral

Training complex behavioral sequences one link at a time. Backward chaining trains the last step first so earlier steps become conditioned reinforcers. Used in vocational training, music, speech therapy.

Behavioral

Positive Reinforcement

Click to reveal →

Behavioral

Add a desirable stimulus following a response → response increases. "Positive" = something is added; "reinforcement" = behavior goes up. Example: food pellet for lever press.

Behavioral

Negative Reinforcement

Click to reveal →

Behavioral

Remove an aversive stimulus following a response → response increases. NOT punishment. "Negative" = something is removed; behavior still goes UP. Example: pressing lever to stop shock.

Behavioral

Positive Punishment

Click to reveal →

Behavioral

Add an aversive stimulus following a response → response decreases. "Positive" = something is added. Example: electric shock for entering wrong arm of maze.

Behavioral

Negative Punishment

Click to reveal →

Behavioral

Remove a desirable stimulus following a response → response decreases. Example: taking away TV time after misbehavior (response cost).

Behavioral

Primary Reinforcer

Click to reveal →

Behavioral

Biologically significant reinforcer; reinforcing without prior learning. State-dependent: a sated rat won't work for food. Examples: food, water, sex.

Behavioral

Secondary Reinforcer

Click to reveal →

Behavioral

Acquires reinforcing value via association with a primary reinforcer. Deliverable instantly; not state-dependent. Examples: money, grades, clicker, tokens in a token economy.

Behavioral

FR Schedule

Click to reveal →

Behavioral

Fixed-Ratio: Reinforcement after every N responses. Produces steady high rates + characteristic post-reinforcement pause. Example: piecework pay. Cumulative record: staircase pattern.

Behavioral

VR Schedule

Click to reveal →

Behavioral

Variable-Ratio: Reinforcement after N responses on average. Produces highest, most persistent rates with minimal post-reinforcement pause. Hardest to extinguish. Example: slot machines.

Behavioral

FI Schedule (Scalloped Curve)

Click to reveal →

Behavioral

Fixed-Interval: Reinforcement for first response after fixed time. Produces scalloped cumulative curve — slow start, accelerating toward interval end as animal times the gap. Example: checking a set-timer oven.

Behavioral

VI Schedule

Click to reveal →

Behavioral

Variable-Interval: Reinforcement for first response after variable time. Produces steady moderate rates with no post-reinforcement pause. Example: checking a friend's unpredictably updated social post.

Behavioral

Matching Law

Click to reveal →

Behavioral

Herrnstein (1961): On concurrent VI schedules, relative response rate approximately equals relative reinforcement rate. Rate(A)/Rate(B) = Reinforcement(A)/Reinforcement(B).

Behavioral

Delay Discounting

Click to reveal →

Behavioral

Future rewards are subjectively valued less than immediate rewards. Rate of discounting varies; adolescents discount more steeply than adults. Underlies impulsivity and self-control failures.

Behavioral

Pre-commitment

Click to reveal →

Behavioral

Deliberately making the immediate reward harder to access to support long-term goals. Examples: gym membership, study group, removing junk food from home.

Behavioral

Bliss Point

Click to reveal →

Behavioral

The behavioral allocation that maximizes the individual's subjective value. In behavioral economics: the unconstrained preferred consumption point — where indifference curves peak.

Neural

Dorsal Striatum (S-R learning, habits)

Click to reveal →

Neural

Caudate + putamen. Associates discriminative stimuli with responses. Lesion: impairs cue-guided responding; habitual S-R associations. Critical for Parkinson's (Go pathway) and Huntington's disease.

Neural

Orbitofrontal Cortex (R-O learning)

Click to reveal →

Neural

Links responses to expected outcome identity. Receives multimodal + visceral signals. Lesion: impairs reversal learning; steepens delay discounting. OFC neurons fire during anticipatory delay coding which reward to expect.

Neural

Go/No-Go Pathways (Direct D1; Indirect D2)

Click to reveal →

Neural

Go (direct, D1): Dopamine excites D1 striatum → inhibits GPi → disinhibits thalamus → facilitates action.
No-Go (indirect, D2): Dopamine inhibits D2 striatum → GPe → STN → GPi → suppresses competing actions.

Neural

Incentive Salience (Wanting / Dopamine)

Click to reveal →

Neural

Berridge's term for the motivation to seek and work for reward, mediated by dopamine. Distinct from hedonic liking. Sensitized in addiction → escalating craving despite declining pleasure.

Neural

Endogenous Opioids (Liking)

Click to reveal →

Neural

Enkephalins and endorphins mediate hedonic pleasure from consumption. Morphine increases sweet preference; naloxone reduces it. Preserved in dopamine-depleted animals confirming dissociation from wanting.

Neural

Extinction Mimicry (Pimozide)

Click to reveal →

Neural

Pimozide (dopamine blocker) stops lever-pressing even though food is still delivered per press. Looked like extinction — but hedonic facial reactions to food remained intact. Confirms dopamine drives wanting, not liking.

Neural

PIT (Pavlovian-to-Instrumental Transfer)

Click to reveal →

Neural

A Pavlovian CS (trained separately, no response required) modulates vigor of an ongoing instrumental response. Specific PIT (BLA → NAc shell) vs. General PIT (CeA → NAc core via VTA).

Neural

Intracranial Self-Stimulation

Click to reveal →

Neural

Olds (1955): rats press a lever thousands of times/hour for VTA/lateral hypothalamus stimulation, preferring it to food. Established dopamine neurons as powerful reinforcers of behavior.

Clinical

Pathological Addiction

Click to reveal →

Clinical

Compulsive drug-seeking maintained despite known harmful consequences. Driven by positive reinforcement (the high) + negative reinforcement (withdrawal avoidance). Reflects sensitized wanting + down-regulated liking over time.

Clinical

Wanting/Liking Dissociation (Berridge)

Click to reveal →

Clinical

You can want without liking (dopamine-antagonized rats stop working but still prefer sucrose) and like without wanting. Separable neural systems: dopamine (wanting) vs. opioids (liking). Clinical implication: addicts crave intensely but no longer enjoy the drug.

Clinical

Specific PIT (BLA → NAc shell)

Click to reveal →

Clinical

CS modulates only the response linked to the same outcome (outcome-specific). Mediated by basolateral amygdala (BLA) projecting to nucleus accumbens shell. Relevant to specific cue-triggered relapse.

Clinical

General PIT (CeA → NAc core via VTA)

Click to reveal →

Clinical

CS increases all instrumental responses non-selectively (general motivational boost). Mediated by central amygdala (CeA) projecting to nucleus accumbens core via VTA. Relevant to general stress/arousal-induced relapse.

Practice Multiple Choice — 20 Questions

Click an option to check your answer. Explanations appear automatically.

Questions

Correct

Incorrect

—

Score

Big Picture Synthesis

How the week's concepts connect across levels of analysis and to the course arc.

The Unifying Principle

Operant conditioning reveals how organisms learn which actions to repeat. Dopamine drives the motivation to seek rewards (wanting); opioids deliver the pleasure of having them (liking). The basal ganglia implement this at the circuit level — Go/D1 selects actions, No-Go/D2 suppresses competitors, and the OFC represents what outcome is expected.

Levels of Analysis

Behavior

S-R-O framework; reinforcement schedules

Matching law; delay discounting

Pavlovian-to-Instrumental Transfer

Circuit

Direct (Go/D1) & Indirect (No-Go/D2) pathways

Dorsal striatum (S-R); OFC (R-O)

BLA/CeA (Specific/General PIT)

Synapse

Dopamine: D1 excites, D2 inhibits

Opioids: hedonic pleasure

DAT blockade (cocaine); DA release (amphetamine)

Structure

Corticostriatal loops (parallel channels)

GPi/SNr: tonic inhibition of thalamus

SNc/VTA: dopamine source

Likely Exam Themes

Identify the reinforcement schedule from a scenario; predict response pattern and persistence
Distinguish negative reinforcement from punishment — most common error on this topic
Apply wanting/liking dissociation: which neurotransmitter, which experimental evidence
Trace the Go/No-Go basal ganglia pathways with receptor types and output nuclei
Contrast dorsal striatum (S-R) vs. OFC (R-O) and predict lesion effects
Describe the three-phase PIT procedure and distinguish specific vs. general PIT neural substrates
Explain addiction as a wanting/liking dissociation with cocaine/amphetamine mechanisms

Cross-Course Connections

Dopamine PE signal (Weeks 5–7) becomes the motivational wanting signal here
Basal ganglia direct/indirect → Module 10 (skill memory, actor-critic framework)
PIT bridges classical (Weeks 6–7) and instrumental conditioning
Delay discounting connects to economic decision-making frameworks
Matching law = behavioral-level implementation of value-based choice
OFC outcome representation → model-based RL: knowing expected outcomes enables flexible planning