Goodhart's Law
When a measure becomes a target, it ceases to be a good measure
initiate
Goodhart's Law
The Law
"When a measure becomes a target, it ceases to be a good measure"
Original Context (Economics)
- British government wanted to measure economic health
- Used GDP as metric
- Actors started optimizing for GDP
- → GDP became distorted, stopped being useful metric
AI Alignment Context
AI will:
- Optimize literally
- Find edge cases
- Maximize metric without caring about intent
- Find loopholes we didn't anticipate
Concrete Examples
Education
- Metric: Test scores
- Target: Improve test scores
- Result: Teaching to the test, not actual learning
Healthcare
- Metric: Patient survival rate
- Target: Increase survival rate
- Result: Refusing risky patients
AI Training
- Metric: Reward function
- Target: Maximize reward
- Result: Reward hacking, not solving actual problem
Why Critical for AI
- AI optimization is literal
- AI is more intelligent → finds better exploits
- AI operates at scale → small errors = catastrophic
- We can't patch once superintelligence deployed