Back Exam Prep › Module 8
0 / 30
PSYCH 505 · Week 8

Operant Conditioning

From Thorndike's puzzle box to the basal ganglia — the neuroscience of action learning and motivation.

📦
Section 1
What is Operant Conditioning?

Operant conditioning is learning to make or withhold responses in order to obtain reinforcers or avoid punishers. The organism's behavior operates on the environment to produce consequences.

S-R-O Framework: S (discriminative stimulus — signals whether R will be reinforced) → R (response — the action performed) → O (outcome — the reinforcer or punisher that follows).

CRITICAL DISTINCTION from classical conditioning: In classical conditioning, the outcome follows the S regardless of what the organism does. In operant conditioning, the outcome only follows if R is made — response contingency is the defining feature.

⚠ Key Distinction
Classical: O follows S regardless of R.  |  Operant: O follows ONLY if R occurs. Confusing these on an exam is a critical error.

Thorndike's Law of Effect

  • A response followed by a satisfying outcome increases in frequency (strengthens the S-R association).
  • A response followed by an unsatisfying outcome decreases in frequency (weakens the S-R association).
  • Directly parallels the Rescorla-Wagner update rule: satisfying = positive prediction error; unsatisfying = negative prediction error.

Thorndike's Puzzle Box

S-R-O Mapping
S = puzzle box (context/discriminative stimulus)
R = lever/rope sequence to open door
O = escape from box + access to food
ParadigmWho Controls RateKey Feature
Thorndike (discrete trial)ExperimenterAnimal placed in box at start of each trial; one trial at a time
Skinner (free-operant)AnimalAnimal can press lever at any time; cumulative recorder tracks rate (slope = response rate)
🎯
Section 2
Shaping, Chaining & Reinforcers

Shaping: Reinforcing successive approximations to the target behavior, guiding the organism incrementally when it is unlikely to spontaneously produce the desired response. The full target behavior is never required at the outset.

Chaining: Training complex behavioral sequences one link at a time. Backward chaining trains the last step first, so each earlier step becomes a conditioned reinforcer for the step that follows. Used in vocational training, music instruction, and speech therapy.

The 2 × 2 Reinforcement/Punishment Table

ConsequenceAdd something (+)Remove something (−)
Behavior increasesPositive Reinforcement — add a desirable stimulus → response increasesNegative Reinforcement — remove an aversive stimulus → response increases
Behavior decreasesPositive Punishment — add an aversive stimulus → response decreasesNegative Punishment — remove a desirable stimulus → response decreases
⚠ Most Common Error on This Topic
Negative reinforcement is NOT punishment. Reinforcement ALWAYS increases behavior. The word "negative" refers only to removing something — not to a bad outcome. Escaping from pain (removing aversive) = negative reinforcement = behavior increases.

Primary vs. Secondary Reinforcers

TypeDefinitionPropertiesExamples
PrimaryBiologically significant; reinforcing without prior learningState-dependent (sated rat won't work for food)Food, water, sex
Secondary (conditioned)Acquires value via association with primary reinforcerDeliverable instantly; not state-dependentMoney, grades, clicker, tokens
Token Economies
Secondary reinforcement applied at scale: tokens earned for target behaviors, exchanged later for primary reinforcers. Widely used in psychiatric hospitals and ASD classrooms.
⏱️
Section 3
Reinforcement Schedules
ScheduleRuleResponse PatternReal-world Example
FR (Fixed-Ratio)Reinforcement after every N responsesSteady high rate + post-reinforcement pausePiecework pay (paid per unit produced)
VR (Variable-Ratio)Reinforcement after N responses on average (N varies)HIGHEST, most persistent rates; minimal pauseSlot machines, social media "likes"
FI (Fixed-Interval)Reinforcement for first response after fixed time periodScalloped curve — slow start, accelerating as interval end approachesChecking the oven at a set timer
VI (Variable-Interval)Reinforcement for first response after variable time periodSteady moderate rates; no post-reinforcement pauseChecking a friend's social post (unpredictable timing)
Key Principle — Why Variable Schedules Are So Compelling
Variable schedules (VR and VI) eliminate the post-reinforcement pause because reinforcement could arrive at any moment. This produces the most persistent behavior and the hardest-to-extinguish responding. Slot machines use VR — this is why gambling is so addictive. Extinction after VR is far slower than after FR.
Scalloped FI Curve — Know This
On FI schedules, animals learn to time the interval. They show little responding right after reinforcement (the animal "knows" it's too early), then accelerate as the interval end approaches. The cumulative record forms a characteristic scalloped shape.
📊
Section 4
Matching Law & Behavioral Economics

Matching Law (Herrnstein, 1961): On concurrent VI schedules (two response options simultaneously available), the relative rate of responding to each option approximately equals the relative rate of reinforcement from each option.

Herrnstein's Matching Equation
Rate(A) / Rate(B) = Reinforcement(A) / Reinforcement(B)
  • Bliss point: The allocation of time/effort that maximizes the individual's subjective value — where indifference curves peak.
  • Delay discounting: Future rewards are valued less than immediate rewards. Rate of discounting varies across individuals; 12-year-olds discount more steeply than adults.
  • Pre-commitment: Deliberately making it harder to access the immediate reward to improve adherence to long-term goals. Examples: gym membership, study groups, blocking distracting websites.
Connection to Behavioral Economics
Delay discounting is the behavioral mechanism behind impulsivity and self-control failures. Interventions that impose a delay between craving and consumption weaken the R-O association and reduce relapse.
🔄
Section 5
Basal Ganglia: Direct & Indirect Pathways

Key structures: Striatum (input nucleus; dorsal striatum = caudate + putamen). GPi/SNr (output; tonically inhibitory — hold thalamus suppressed at rest). SNc/VTA (dopamine sources — SNc projects to dorsal striatum; VTA to ventral striatum/NAc).

Direct Pathway ("Go" — D1 receptors)
Cortex → D1-Striatum (EXCITED by dopamine) → GPi inhibited (GABA) → Thalamus disinhibited ↑ → Cortex activated
Net effect: FACILITATES selected action
Indirect Pathway ("No-Go" — D2 receptors)
Cortex → D2-Striatum (INHIBITED by dopamine) → GPe inhibited → STN disinhibited → GPi excited (glutamate) → Thalamus suppressed ↓
Net effect: SUPPRESSES competing actions
Hyper-direct Pathway ("Pause")
Cortex → STN → GPi directly
Net effect: Rapid broad suppression — halts action commitment before full deliberation
Dopamine's ActionReceptorPathwayNet Behavioral Effect
Excites D1 neuronsD1 (direct)GoPromotes execution of selected action
Inhibits D2 neuronsD2 (indirect)No-GoSuppresses competing actions
Exam Tip — Dopamine's Dual Role
Dopamine's power comes from acting on both pathways simultaneously: it promotes Go AND suppresses No-Go. Together this tips the balance decisively toward the rewarded action. Parkinson's disease (dopamine depletion) impairs movement initiation (Go) AND increases suppression of competing actions (No-Go stuck "on").
🧠
Section 6
Dorsal Striatum vs. Orbitofrontal Cortex
StructureLearning TypeWhat It EncodesLesion Effect
Dorsal Striatum S-R learning (habitual, cue-guided) Associates discriminative stimuli with appropriate responses; forms automatic habits Impairs responding when discriminative cue must guide response; simple R-O survives
Orbitofrontal Cortex (OFC) R-O learning (goal-directed, outcome identity) Links responses to expected identity of the outcome (what reward, not just whether reward) Impairs reversal learning (perseverate on previously rewarded option); steepens delay discounting

Schoenbaum et al.: OFC neurons fire during the delay between response and outcome, coding specifically for expected outcome identity (sucrose vs. quinine). "Error trial" cells fire strongly when an aversive quinine outcome is predicted — they represent anticipated hedonic identity, not just valence.

Tremblay & Schultz monkeys: OFC neurons respond selectively to pictures predicting grape juice but NOT orange juice — even though both are positive rewards. This confirms OFC encodes which reward, not merely reward vs. no reward.

KEY CONTRAST — Striatum vs. OFC
Dorsal striatum = S-R (habitual, cue-guided). Damage → can't use discriminative stimulus to guide response.
OFC = R-O (goal-directed, outcome identity). Damage → respond to old rules even after reversal; steeper delay discounting.
Both are needed for full operant behavior.
💡
Section 7
Wanting vs. Liking: Berridge's Dissociation

Intracranial self-stimulation (Olds, 1955): Rats press a lever thousands of times per hour for stimulation of VTA/lateral hypothalamus. They prefer it to food. Established dopamine neurons as powerful drivers of behavior.

ComponentLabelNeurotransmitter SystemExperimental Evidence
WANTING Incentive salience — motivation to seek and work for reward Dopamine Pimozide (dopamine blocker) → rats stop lever-pressing (wanting abolished) even though food still delivered
LIKING Hedonic value — pleasure from consuming reward Endogenous opioids (enkephalins, endorphins) Dopamine-depleted rats show intact hedonic facial reactions (tongue protrusion to sweet, gapes to bitter)
⚠ Critical Concept — Extinction Mimicry
Pimozide-treated rats stopped lever-pressing (looked like extinction) even though food was still delivered. Critical test: food placed directly in mouth → normal hedonic "yum" reactions (tongue protrusion). Conclusion: dopamine drives the motivation to work (wanting), NOT the pleasure of consumption (liking). Blocking dopamine abolished wanting while leaving liking intact.
  • Sucrose vs. chow: dopamine-antagonized rats settle for free chow but won't press a lever for sucrose. If both placed freely in front of the rat, it still prefers sucrose (liking intact) — it just won't work for it.
  • Hedonic 'yum' facial reactions (tongue protrusion = sweet; gapes = bitter) are phylogenetically conserved and preserved in dopamine-depleted animals.
  • Morphine → makes sweet food taste sweeter (opioid ↑ liking). Naloxone → reduces sweet preference (opioid blockade ↓ liking).
⚠ Misconception — "Dopamine is the pleasure molecule"
This is WRONG. Dopamine = WANTING (motivation/incentive salience). Opioids = LIKING (hedonic pleasure). Blocking dopamine leaves hedonic reactions completely intact. These are separable systems — you can want without liking and like without wanting.
🔗
Section 8
Pavlovian-to-Instrumental Transfer (PIT)

PIT definition: A Pavlovian CS modulates the vigor or rate of an ongoing instrumental response without the CS being contingent on that response. PIT bridges classical and instrumental conditioning.

Three-Phase Procedure

  • Phase 1 — Pavlovian training: CS+ → Outcome (no response required from the animal).
  • Phase 2 — Instrumental training: Lever → Same or different Outcome (no CS present).
  • Transfer test: Lever available + CS+ played, but no outcomes delivered. Does CS+ increase lever-pressing rate? (Yes = PIT demonstrated.)
PIT TypeEffectNeural Substrate
Specific PITCS modulates ONLY the response linked to the same outcome — outcome-specific potentiationBasolateral amygdala (BLA) → Nucleus accumbens shell
General PITCS increases ALL responses non-selectively — general motivational boostCentral amygdala (CeA) → Nucleus accumbens core via VTA
Clinical Example — Cue-Induced Craving & Relapse
An ex-smoker walks into a bar where they used to smoke. The bar is a Pavlovian CS associated with the nicotine high. Via PIT, it potentiates the instrumental response (reaching for a pocket) even though quitting has removed any cigarettes. This is PIT driving cue-induced craving and relapse.
⚠ PIT vs. Discriminative Stimulus Control
These are NOT the same. A discriminative stimulus is part of the instrumental contingency — it signals that R will be reinforced. A PIT CS is trained in a separate Pavlovian phase with no response required. They work through different neural circuits.
🏥
Section 9
Addiction & Clinical Perspectives

Pathological addiction is defined by compulsive drug-seeking maintained despite known harmful consequences. It is maintained by both positive reinforcement (the high) and negative reinforcement (relief from withdrawal).

Long-term use dissociation: Chronic users often report diminished liking (no longer get the same high — opioid system down-regulated) but intensified wanting (craving is stronger — dopamine wanting system sensitized). This is the incentive salience model of addiction.

DrugMechanismEffect on Dopamine
CocaineDAT blocker — prevents dopamine reuptakeDopamine lingers longer in synapse
AmphetamineTriggers dopamine release from vesiclesMore dopamine released into synapse

Conditioning-Based Treatments

  • Distancing (stimulus control): Avoid discriminative stimuli that trigger drug-seeking. Reduce Pavlovian CS exposure to prevent PIT-driven relapse.
  • Contingency management: Reinforce incompatible behaviors — money vouchers for heroin-free urine samples.
  • Imposed delay: Insert a delay between craving and drug access, weakening the R-O association over time.
  • Naltrexone: Opioid receptor antagonist — reduces hedonic "liking" component, making consumption less pleasurable.
  • Behavioral addictions (gambling, gaming): Same dopaminergic circuits as drug addiction. Gambling = VR schedule — most persistent and hardest to extinguish.

Key Terms — 28 Flashcards

Click any card to reveal its definition. Use the filters to focus on a category.

Behavioral
Operant Conditioning
Click to reveal →
Behavioral
Learning to make or withhold responses to obtain reinforcers or avoid punishers. The outcome only follows if the organism performs the response (response contingency).
Behavioral
Law of Effect (Thorndike)
Click to reveal →
Behavioral
Response followed by a satisfying outcome increases in frequency (strengthens S-R). Response followed by an unsatisfying outcome decreases. Foundation of operant learning; parallels R-W update rule.
Behavioral
Shaping
Click to reveal →
Behavioral
Reinforcing successive approximations to a target behavior. Used when organism unlikely to spontaneously emit the desired response — guides it incrementally.
Behavioral
Chaining
Click to reveal →
Behavioral
Training complex behavioral sequences one link at a time. Backward chaining trains the last step first so earlier steps become conditioned reinforcers. Used in vocational training, music, speech therapy.
Behavioral
Positive Reinforcement
Click to reveal →
Behavioral
Add a desirable stimulus following a response → response increases. "Positive" = something is added; "reinforcement" = behavior goes up. Example: food pellet for lever press.
Behavioral
Negative Reinforcement
Click to reveal →
Behavioral
Remove an aversive stimulus following a response → response increases. NOT punishment. "Negative" = something is removed; behavior still goes UP. Example: pressing lever to stop shock.
Behavioral
Positive Punishment
Click to reveal →
Behavioral
Add an aversive stimulus following a response → response decreases. "Positive" = something is added. Example: electric shock for entering wrong arm of maze.
Behavioral
Negative Punishment
Click to reveal →
Behavioral
Remove a desirable stimulus following a response → response decreases. Example: taking away TV time after misbehavior (response cost).
Behavioral
Primary Reinforcer
Click to reveal →
Behavioral
Biologically significant reinforcer; reinforcing without prior learning. State-dependent: a sated rat won't work for food. Examples: food, water, sex.
Behavioral
Secondary Reinforcer
Click to reveal →
Behavioral
Acquires reinforcing value via association with a primary reinforcer. Deliverable instantly; not state-dependent. Examples: money, grades, clicker, tokens in a token economy.
Behavioral
FR Schedule
Click to reveal →
Behavioral
Fixed-Ratio: Reinforcement after every N responses. Produces steady high rates + characteristic post-reinforcement pause. Example: piecework pay. Cumulative record: staircase pattern.
Behavioral
VR Schedule
Click to reveal →
Behavioral
Variable-Ratio: Reinforcement after N responses on average. Produces highest, most persistent rates with minimal post-reinforcement pause. Hardest to extinguish. Example: slot machines.
Behavioral
FI Schedule (Scalloped Curve)
Click to reveal →
Behavioral
Fixed-Interval: Reinforcement for first response after fixed time. Produces scalloped cumulative curve — slow start, accelerating toward interval end as animal times the gap. Example: checking a set-timer oven.
Behavioral
VI Schedule
Click to reveal →
Behavioral
Variable-Interval: Reinforcement for first response after variable time. Produces steady moderate rates with no post-reinforcement pause. Example: checking a friend's unpredictably updated social post.
Behavioral
Matching Law
Click to reveal →
Behavioral
Herrnstein (1961): On concurrent VI schedules, relative response rate approximately equals relative reinforcement rate. Rate(A)/Rate(B) = Reinforcement(A)/Reinforcement(B).
Behavioral
Delay Discounting
Click to reveal →
Behavioral
Future rewards are subjectively valued less than immediate rewards. Rate of discounting varies; adolescents discount more steeply than adults. Underlies impulsivity and self-control failures.
Behavioral
Pre-commitment
Click to reveal →
Behavioral
Deliberately making the immediate reward harder to access to support long-term goals. Examples: gym membership, study group, removing junk food from home.
Behavioral
Bliss Point
Click to reveal →
Behavioral
The behavioral allocation that maximizes the individual's subjective value. In behavioral economics: the unconstrained preferred consumption point — where indifference curves peak.
Neural
Dorsal Striatum (S-R learning, habits)
Click to reveal →
Neural
Caudate + putamen. Associates discriminative stimuli with responses. Lesion: impairs cue-guided responding; habitual S-R associations. Critical for Parkinson's (Go pathway) and Huntington's disease.
Neural
Orbitofrontal Cortex (R-O learning)
Click to reveal →
Neural
Links responses to expected outcome identity. Receives multimodal + visceral signals. Lesion: impairs reversal learning; steepens delay discounting. OFC neurons fire during anticipatory delay coding which reward to expect.
Neural
Go/No-Go Pathways (Direct D1; Indirect D2)
Click to reveal →
Neural
Go (direct, D1): Dopamine excites D1 striatum → inhibits GPi → disinhibits thalamus → facilitates action.
No-Go (indirect, D2): Dopamine inhibits D2 striatum → GPe → STN → GPi → suppresses competing actions.
Neural
Incentive Salience (Wanting / Dopamine)
Click to reveal →
Neural
Berridge's term for the motivation to seek and work for reward, mediated by dopamine. Distinct from hedonic liking. Sensitized in addiction → escalating craving despite declining pleasure.
Neural
Endogenous Opioids (Liking)
Click to reveal →
Neural
Enkephalins and endorphins mediate hedonic pleasure from consumption. Morphine increases sweet preference; naloxone reduces it. Preserved in dopamine-depleted animals confirming dissociation from wanting.
Neural
Extinction Mimicry (Pimozide)
Click to reveal →
Neural
Pimozide (dopamine blocker) stops lever-pressing even though food is still delivered per press. Looked like extinction — but hedonic facial reactions to food remained intact. Confirms dopamine drives wanting, not liking.
Neural
PIT (Pavlovian-to-Instrumental Transfer)
Click to reveal →
Neural
A Pavlovian CS (trained separately, no response required) modulates vigor of an ongoing instrumental response. Specific PIT (BLA → NAc shell) vs. General PIT (CeA → NAc core via VTA).
Neural
Intracranial Self-Stimulation
Click to reveal →
Neural
Olds (1955): rats press a lever thousands of times/hour for VTA/lateral hypothalamus stimulation, preferring it to food. Established dopamine neurons as powerful reinforcers of behavior.
Clinical
Pathological Addiction
Click to reveal →
Clinical
Compulsive drug-seeking maintained despite known harmful consequences. Driven by positive reinforcement (the high) + negative reinforcement (withdrawal avoidance). Reflects sensitized wanting + down-regulated liking over time.
Clinical
Wanting/Liking Dissociation (Berridge)
Click to reveal →
Clinical
You can want without liking (dopamine-antagonized rats stop working but still prefer sucrose) and like without wanting. Separable neural systems: dopamine (wanting) vs. opioids (liking). Clinical implication: addicts crave intensely but no longer enjoy the drug.
Clinical
Specific PIT (BLA → NAc shell)
Click to reveal →
Clinical
CS modulates only the response linked to the same outcome (outcome-specific). Mediated by basolateral amygdala (BLA) projecting to nucleus accumbens shell. Relevant to specific cue-triggered relapse.
Clinical
General PIT (CeA → NAc core via VTA)
Click to reveal →
Clinical
CS increases all instrumental responses non-selectively (general motivational boost). Mediated by central amygdala (CeA) projecting to nucleus accumbens core via VTA. Relevant to general stress/arousal-induced relapse.

Practice Multiple Choice — 20 Questions

Click an option to check your answer. Explanations appear automatically.

30
Questions
0
Correct
0
Incorrect
Score

Big Picture Synthesis

How the week's concepts connect across levels of analysis and to the course arc.

The Unifying Principle
Operant conditioning reveals how organisms learn which actions to repeat. Dopamine drives the motivation to seek rewards (wanting); opioids deliver the pleasure of having them (liking). The basal ganglia implement this at the circuit level — Go/D1 selects actions, No-Go/D2 suppresses competitors, and the OFC represents what outcome is expected.

Levels of Analysis

Behavior
S-R-O framework; reinforcement schedules
Matching law; delay discounting
Pavlovian-to-Instrumental Transfer
Circuit
Direct (Go/D1) & Indirect (No-Go/D2) pathways
Dorsal striatum (S-R); OFC (R-O)
BLA/CeA (Specific/General PIT)
Synapse
Dopamine: D1 excites, D2 inhibits
Opioids: hedonic pleasure
DAT blockade (cocaine); DA release (amphetamine)
Structure
Corticostriatal loops (parallel channels)
GPi/SNr: tonic inhibition of thalamus
SNc/VTA: dopamine source
Likely Exam Themes
  • Identify the reinforcement schedule from a scenario; predict response pattern and persistence
  • Distinguish negative reinforcement from punishment — most common error on this topic
  • Apply wanting/liking dissociation: which neurotransmitter, which experimental evidence
  • Trace the Go/No-Go basal ganglia pathways with receptor types and output nuclei
  • Contrast dorsal striatum (S-R) vs. OFC (R-O) and predict lesion effects
  • Describe the three-phase PIT procedure and distinguish specific vs. general PIT neural substrates
  • Explain addiction as a wanting/liking dissociation with cocaine/amphetamine mechanisms
Cross-Course Connections
  • Dopamine PE signal (Weeks 5–7) becomes the motivational wanting signal here
  • Basal ganglia direct/indirect → Module 10 (skill memory, actor-critic framework)
  • PIT bridges classical (Weeks 6–7) and instrumental conditioning
  • Delay discounting connects to economic decision-making frameworks
  • Matching law = behavioral-level implementation of value-based choice
  • OFC outcome representation → model-based RL: knowing expected outcomes enables flexible planning