Resources
For machine learning researchers
Risks from advanced AI
- ∗ Why I Think More NLP Researchers Should Engage with AI Safety Concerns by Sam Bowman (2022)
- ∗ Researcher Perceptions of Current and Future AI by Vael Gates (2022)
Orienting
- ∗ More is Different for AI by Jacob Steinhardt (2022)
- AI Timelines/Risk Projections as of Sep. 2022
- Frequent Arguments About Alignment by John Schulman (2021)
Technical Research
Reviews
- ∗ The Alignment Problem from a Deep Learning Perspective (Ngo et al., 2022)
- Agendas: Unsolved Problems in ML Safety (Hendrycks et al., 2022), Concrete Problems in AI safety (Amodei et al., 2016)
- AI Safety Resources and Overview by Victoria Krakovna (DeepMind)
Primary
- ∗ Goal Misgeneralization: Why correct specifications aren't enough for correct goals (Shah et al., 2022)
- Specification gaming: the flip side of AI ingenuity (Krakovna et al., 2020)
- Mechanistic interpretability: In-context Learning and Induction Heads (Olsson et al., 2022), Locating and Editing Factual Associations in GPT (Meng et al., 2022)
- Optimal Policies Tend to Seek Power (Turner et al., 2021)
Other
- ∗ AGI Safety Fundamentals Curriculum (best in-depth resource)
- Alignment Newsletter and ML Safety Newsletter
Interested in doing AI alignment research?
- Learn about the organizations and researchers in the space, funding and job opportunities, and guides to get involved at What can I do?
For a general audience
- ∗ The Case For Taking AI Seriously As A Threat to Humanity by Kelsey Piper (2020)
- The Alignment Problem by Brian Christian (2020)
- Existential Risk from Power-Seeking AI by Joe Carlsmith (2021)
- Why AI Alignment Could be Hard with Modern Deep Learning by Ajeya Cotra (2021)
- 80,000 Hours Podcast: Preventing an AI-related Catastrophe (2022)
- The Most Important Century by Holden Karnofsky (podcast, summaries, various articles)
- AI Safety YouTube channel by Robert Miles