What are some introductions to AI safety?
Note that some of these introductions are from over 5 years ago. Given how quickly the field of AI progresses, some of these older introductions could use an update (e.g. Nick Bostrom’s 2014 book Superintelligence has little focus on modern deep learning systems).
A separate document lists some counterarguments: What are some arguments why AI safety might be less important?.
Top recommendations:
-
Four Background Claims (Nate Soares, ~10 minutes)
-
Managing AI Risks in an Era of Rapid Progress (Yoshua Bengio, Geoffrey Hinton, Andrew Yao, et al., ~15 minutes)
-
Preventing an AI-related catastrophe (Benjamin Hilton, ~45 minutes)
-
An Overview of Catastrophic AI Risks (Dan Hendrycks, Mantas Mazeika, Thomas Woodside, ~2 hours)
-
Survey of AI Technologies and AI R&D Trajectories (Jeremie Harris, Edouard Harris, Mark Beall, ~2 hours)
-
Foundational Challenges in Assuring Alignment and Safety of Large Language Models (Usman Anwar et al., ~6 hours)
Quick reads (under ~10 minutes)
-
We must slow down the race to God-like AI (Ian Hogarth)
-
Building safe artificial intelligence: specification, robustness, and assurance (DeepMind Safety Research)
-
Nobody’s on the ball on AGI alignment (Leopold Aschenbrenner)
-
Frequent arguments about alignment (John Schulman)
-
AI safety (Wikipedia)
-
Of Myths And Moonshine (Stuart Russell)
-
Risks of artificial intelligence (PauseAI)
-
Intro to AI Safety (Robert Huben)
-
Will AI really cause a catastrophe? (Michigan AI Safety Initiative)
-
Global risk from deep learning: The case for risk (Daniel Dewey)
-
Explore Your AI Risk Perspectives: An Interactive Walkthrough of Researchers' Most Frequent Interview Responses (AI Risk Discussions)
-
What is the alignment problem? (Jan Leike)
-
AI is Not an Arms Race (Katja Grace)
-
Complex Systems are Hard to Control (Jacob Steinhardt)
-
a casual intro to AI doom and alignment (Tamsin Leake)
-
To Predict What Happens, Ask What Happens (Zvi Mowshowitz)
-
Marius alignment pitch (Marius Hobbhahn)
-
This Changes Everything (Ezra Klein)
-
How AI could accidentally extinguish humankind (Émile Torres)
-
My current summary of the state of AI risk (Eli Tyre)
-
AI doom from an LLM-plateau-ist perspective (Steve Byrnes)
-
Why Uncontrollable AI Looks More Likely Than Ever (Otto Barten, Roman Yampolskiy)
Short(ish) introductions
-
The case for taking AI seriously (or a similar argument in 500 words) (Kelsey Piper)
-
A Field Guide to AI Safety (Kelsey Piper)
-
The alignment problem from a deep learning perspective (Richard Ngo, Lawrence Chan, Sören Mindermann)
-
More Is Different for AI blog post series (Jacob Steinhardt)
-
Why Might Misaligned, Advanced AI Cause Catastrophe? (Compilation) (AI Safety Fundamentals Team)
-
Why I Think More NLP Researchers Should Engage with AI Safety Concerns (Sam Bowman)
-
Artificial Intelligence: Arguments for Catastrophic Risk (Adam Bales, William D'Alessandro, Cameron Domenico Kirk-Giannini)
-
How Rogue AIs may Arise (Yoshua Bengio)
-
FAQ on Catastrophic AI Risks (Yoshua Bengio)
-
AI x-risk, approximately ordered by embarrassment (Alex Lawsen)
-
AI Alignment, Explained in Five Points (Daniel Eth)
-
Why alignment could be hard with modern deep learning (Ajeya Cotra)
-
Altruists Should Prioritize Artificial Intelligence (Lukas Gloor)
-
Clarifying AI X-risk and Threat Model Literature Review (DeepMind's AGI safety team)
-
AI experts are increasingly afraid of what they’re creating (Kelsey Piper)
-
Why worry about future AI? (Gavin Leech)
-
How to navigate the AI apocalypse as a sane person (Eric Hoel)
-
AGI Ruin: A list of lethalities (Eliezer Yudkowsky); also see Where I agree and disagree with Eliezer (Paul Christiano)
-
No Time Like The Present For AI Safety Work (Scott Alexander)
-
Ethical Issues in Advanced Artificial Intelligence (Nick Bostrom)
-
Benefits & Risks of Artificial Intelligence (Ariel Conn)
-
The basic reasons I expect AGI ruin + An artificially structured argument for expecting AGI ruin (Rob Bensinger)
-
The case for how and why AI might kill us all (Loz Blain)
-
Potential Risks from Advanced Artificial Intelligence: The Philanthropic Opportunity (Holden Karnofsky)
-
A newcomer’s guide to the technical AI safety field (Chin Ze Shen)
-
Q & A: The future of artificial intelligence (Stuart Russell)
-
The Basic AI Drives (Steve Omohundro)
-
AI Risk Intro 1: Advanced AI Might Be Very Bad (TheMcDouglas, LRudL)
-
Distinguishing AI takeover scenarios + Investigating AI takeover scenarios (Sam Clarke, Samuel Martin)
-
Artificial intelligence is transforming our world — it is on all of us to make sure that it goes well (Max Roser)
-
Intelligence Explosion: Evidence and Import (Luke Muehlhauser, Anna Salamon)
-
The Value Learning Problem (Nate Soares)
-
Uncontrollable AI as an Existential Risk (Karl von Wendt)
-
Current and Near-Term AI as a Potential Existential Risk Factor (Benjamin S. Bucknall, Shiri Dori-Hacohen)
-
AI Risk for Epistemic Minimalists (Alex Flint)
Longer introductions
-
The Compendium (Connor Leahy, Gabriel Alfour, Chris Scammell, Andrea Miotti, Adam Shimi)
-
Situational Awareness (Leopold Aschenbrenner)
-
TASRA: a Taxonomy and Analysis of Societal-Scale Risks from AI (Andrew Critch, Stuart Russell)
-
The “most important century” blog post series summary and the "implications of most important century” posts like AI could defeat all of us combined and Why would AI “aim” to defeat humanity? and How we could stumble into AI catastrophe (Holden Karnofsky)
-
AI Safety for Fleshy Humans (Nicky Case, Hack Club)
-
Current work in AI alignment (Paul Christiano)
-
A gentle introduction to why AI might end the human race (Michael Tontchev)
-
Natural Selection Favors AIs over Humans (Dan Hendrycks)
-
Unsolved Problems in ML Safety (Dan Hendrycks)
-
X-Risk Analysis for AI Research (Dan Hendrycks, Mantas Mazeika)
-
Is Power-Seeking AI an Existential Risk? + shortened version + presentation (Joseph Carlsmith; also see this shorter version)
-
AGI safety from first principles (Richard Ngo)
-
Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover + presentation (Ajeya Cotra)
-
Concrete Problems in AI Safety (Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, Dan Mané)
-
The AI Revolution: The Road to Superintelligence and The AI Revolution: Our Immortality or Extinction + some corrections from Luke Muehlhauser (Tim Urban)
-
AI as a Positive and Negative Factor in Global Risk (Eliezer Yudkowsky)
-
Extinction Risk from Artificial Intelligence (Michael Cohen)
-
Set Sail For Fail? On AI risk (José Luis Ricón Fernández de la Puente)
-
A shift in arguments for AI risk (Tom Adamczewski)
-
Uncontrollability of AI (Roman Yampolskiy)
-
Thoughts on AGI safety from the top (jylin04)
-
Disjunctive Scenarios of Catastrophic AI Risk (Kaj Sotala; see these highlights)
-
AI alignment: The enormous problem facing humanity. (Tim Bakker)
-
Modeling Transformative AI Risks (MTAIR) Project -- Summary Report (Sam Clarke et al.)
Overviews of various research areas:
-
Shallow review of live agendas in alignment & safety (technicalities, Stag)
-
Transformative AI Governance: A Literature Review (draft by Matthijs Maas)
-
My Overview of the AI Alignment Landscape: A Bird's Eye View (Neel Nanda)
-
(My understanding of) What Everyone in Technical Alignment is Doing and Why (Thomas Larsen, Eli Lifland) + Alignment Org Cheat Sheet (Thomas Larsen, Akash Wasil)
-
Framing AI strategy (Zach Stein-Perlman)
-
“What you can do concretely to help” section of “Preventing an AI-related catastrophe” (Benjamin Hilton)
-
The longtermist AI governance landscape: a basic overview (Sam Clarke)
-
A Brief Overview of AI Safety/Alignment Orgs, Fields, Researchers, and Resources for ML Researchers (Austin Witte)
-
AI Governance: A Research Agenda (Allan Dafoe)
-
Racing through a minefield: the AI deployment problem (Holden Karnofsky)
-
An overview of 11 proposals for building safe advanced AI (Evan Hubinger)
-
2021 Alignment Literature Review and Charity Comparison (Larks)
-
AI Research Considerations for Human Existential Safety (ARCHES) (Andrew Critch, David Krueger)
-
A descriptive, not prescriptive, overview of current AI Alignment Research (Jan Hendrik Kirchner, Logan Riggs Smith, Jacques Thibodeau, janus)
-
AGI Safety Literature Review (Tom Everitt, Gary Lea, Marcus Hutter)
-
AI Alignment Research Overview (Jacob Steinhardt)
-
On how various plans miss the hard bits of the alignment challenge (Nate Soares)
-
Some AI research areas and their relevance to existential safety (Andrew Critch)
-
A newcomer’s guide to the technical AI safety field (zeshen)
-
Open Problems in AI X-Risk [PAIS #5] (Dan Hendrycks, Thomas Woodside)
-
Anti-Literature Review from “AI X-risk >35% mostly based on a recent peer-reviewed argument” (Michael Cohen)
-
AI Governance & Strategy: Priorities, talent gaps, & opportunities (Akash Wasil)
Podcasts and videos
See https://aisafety.video for more videos, and here for more podcasts.
-
Rohin Shah or Ben Garfinkel on the 80,000 Hours Podcast
-
Paul Christiano or Richard Ngo on the AI X-risk Research Podcast
-
Ajeya Cotra or Neel Nanda on the Future of Life Institute Podcast
-
Carl Shulman on the Dwarkesh Podcast
-
Ensuring smarter-than-human intelligence has a positive outcome (Nate Soares)
-
AI Alignment: Why It's Hard, and Where to Start (Eliezer Yudkowsky)
-
Some audio recordings of the readings above (e.g. Cold Takes Audio, reading of 80k intro, EA Forum posts, EA Radio, Astral Codex Ten Podcast, Less Wrong Curated Podcast, Nonlinear Library)
Courses
-
AI Safety, Ethics, and Society textbook and + virtual course from the Center for AI Safety
-
AI Safety Fundamentals from BlueDot Impact
-
Intro to ML Safety lectures and online course from the Center for AI Safety
-
[shared] "Key Phenomena in AI Risk" - Reading Curriuclum (see the course announcement)
-
STS 10SI: Intro to AI Alignment Syllabus [Public] from Stanford’s AI safety student group
-
HAIST/MAIA Summer 2023 Curriculum from Harvard and MIT’s AI safety student groups; also see this fall 2023 curriculum.
-
AISST Policy Fellowship Syllabus from Harvard’s AI safety student group
-
Safety and Control for Artificial General Intelligence (Fall 2018) from UC Berkeley
-
CS 362: Research in AI Alignment from Stanford
Similar lists
-
Introductory Resources on AI Risks (Will Jones, Future of Life Institute)
-
Overviews of AI risk (Shakeel Hashim)
-
Non-Technical Introduction to AI Safety (Harvard AI Safety Team)
-
Why Might Misaligned, Advanced AI Cause Catastrophe? (BlueDot Impact)
-
Resources I sent to AI researchers about AI safety (Vael Gates)
-
Resources (AI Risk Discussions)
-
x-risk spiel (Jaan Tallinn)
-
Introductory Resources on AI Extinction Risks (Siméon Campos)
Books
-
Uncontrollable by Darren McKee
-
The Alignment Problem by Brian Christian
-
Human Compatible by Stuart Russell
-
Superintelligence by Nick Bostrom
-
Life 3.0 by Max Tegmark
-
Smarter Than Us by Stuart Armstrong
-
AI sections in The Precipice by Toby Ord
Misc
-
This post has a fairly comprehensive list of non-fiction, non-technical books about AI.
-
Stuart Russell's collection of research and media appearances
-
Zvi Mowshowitz’s Substack has excellent AI coverage
-
It Looks Like You’re Trying To Take Over The World (Gwern Branwen)
-
AI alignment resources (Victoria Krakovna)
-
Wait But Why: Part 1 - Part 2 - Reply from Luke Muehlhauser
-
2021 MIRI Conversations: Ngo-Yudkowsky (ACX summary), Christiano-Yudkowsky (ACX summary), others
-
A Response to Steven Pinker on AI (Rob Miles)
-
A central AI alignment problem: capabilities generalization, and the sharp left turn (Nate Soares)
-
What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs) (Andrew Critch)
-
Scheming AIs: Will AIs fake alignment during training in order to get power? (Joseph Carlsmith)
-
Nine Things You Should Know About AI (Stuart Russell)
-
Cold Takes on AI (Holden Karnofsky)
-
FAQs: superintelligence (Scott Alexander), intelligence explosion (Luke Muehlhauser)
-
Paths to failure (Karl von Wendt et al.)
-
AI Alignment Is Turning from Alchemy Into Chemistry (Alexey Guzey)