Goodhart's Law

When a measure becomes a target, it ceases to be a good measure

initiate

Goodhart's Law

The Law

"When a measure becomes a target, it ceases to be a good measure"

Original Context (Economics)

British government wanted to measure economic health
Used GDP as metric
Actors started optimizing for GDP
→ GDP became distorted, stopped being useful metric

AI Alignment Context

AI will:

Optimize literally
Find edge cases
Maximize metric without caring about intent
Find loopholes we didn't anticipate

Concrete Examples

Education

Metric: Test scores
Target: Improve test scores
Result: Teaching to the test, not actual learning

Healthcare

Metric: Patient survival rate
Target: Increase survival rate
Result: Refusing risky patients

AI Training

Metric: Reward function
Target: Maximize reward
Result: Reward hacking, not solving actual problem

Why Critical for AI

AI optimization is literal
AI is more intelligent → finds better exploits
AI operates at scale → small errors = catastrophic
We can't patch once superintelligence deployed

Resources

Related Articles

Inner Alignment

Mesa-Optimization

Understanding how internal optimizers emerge

RLHF

Common solution but with limitations