🌿Initiate(5-20h)

Core Problems: Outer Alignment

Learn about specification problems, reward hacking, and Goodhart's Law

Articles

Estimated time

5-8h

0 / 3

Articles in this module

Why it's hard to specify what we want

When AI exploits loopholes in reward functions

When a measure becomes a target, it ceases to be a good measure

Once you've completed this module, move to the next level to deepen your understanding.