Papers Fondamentaux
Les articles de recherche les plus importants sur l'alignement de l'IA
Intermediate
Papers Fondamentaux
Outer Alignment
-
Concrete Problems in AI Safety (Amodei et al., 2016)
-
Specification Gaming (Krakovna et al., 2020)
-
Categorizing Goodhart's Law (Manheim & Garrabrant, 2018)
Inner Alignment
-
Risks from Learned Optimization (Hubinger et al., 2019) ⭐
-
Sleeper Agents (Anthropic, 2024) ⭐
Corrigibility
- Corrigibility (Soares et al., 2015) ⭐
Scalable Oversight
-
AI Safety via Debate (Irving et al., 2018)
-
Iterated Amplification (Christiano, 2018)
-
ELK (ARC, 2021) ⭐
Value Learning
-
Cooperative Inverse RL (Hadfield-Menell et al., 2016)
-
CEV (Yudkowsky, 2004)
Fundamentals
-
Embedded Agency (Demski & Garrabrant, 2018) ⭐
-
Logical Induction (Garrabrant et al., 2016)
-
Superintelligence (Bostrom, 2014)
- Livre
⭐ = Must-read