Papers Fondamentaux

Les articles de recherche les plus importants sur l'alignement de l'IA

Intermediate

Papers Fondamentaux

Outer Alignment

Concrete Problems in AI Safety (Amodei et al., 2016)
- https://arxiv.org/abs/1606.06565
Specification Gaming (Krakovna et al., 2020)
- Spreadsheet
Categorizing Goodhart's Law (Manheim & Garrabrant, 2018)
- https://arxiv.org/abs/1803.04585

Inner Alignment

Risks from Learned Optimization (Hubinger et al., 2019) ⭐
- https://arxiv.org/abs/1906.01820
Sleeper Agents (Anthropic, 2024) ⭐
- https://arxiv.org/abs/2401.05566

Corrigibility

Corrigibility (Soares et al., 2015) ⭐
- https://intelligence.org/files/Corrigibility.pdf

Scalable Oversight

AI Safety via Debate (Irving et al., 2018)
- https://arxiv.org/abs/1805.00899
Iterated Amplification (Christiano, 2018)
- https://ai-alignment.com/
ELK (ARC, 2021) ⭐
- https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8

Value Learning

Cooperative Inverse RL (Hadfield-Menell et al., 2016)
- https://arxiv.org/abs/1606.03137
CEV (Yudkowsky, 2004)
- https://intelligence.org/files/CEV.pdf

Fundamentals

Embedded Agency (Demski & Garrabrant, 2018) ⭐
- https://www.alignmentforum.org/s/Rm6oQRJJmhGCcLvxh
Logical Induction (Garrabrant et al., 2016)
- https://intelligence.org/files/LogicalInduction.pdf
Superintelligence (Bostrom, 2014)
- Livre

⭐ = Must-read

Articles Connexes

Qu'est-ce que l'AI Alignment ?

Commencer par les bases

Chercheurs Clés

Les experts à suivre