What is a path to AGI being an existential risk?

3 min read

Suggest changes in Google Docs

Here is a conjunctive path¹ to AI takeover² inspired by Joe Carlsmith’s report on power-seeking AI. Each step is uncertain and depends on the realization of the previous one. We do not assign probabilities for each of these steps as people have widely varying estimates for these probabilities, but argue that the end result is probable enough to warrant attention. You can find a more abstract version of this argument here.

The path goes like this:

Building human-level AGI is possible in principle. In this context, AGI refers to AI which can do things like: think strategically, do independent scientific research, design new computer systems, engage in high levels of persuasion and make and carry out plans.³
Within the foreseeable future, humanity will have the technological capability to construct AGI.
Once feasible, humanity is expected to proceed with building agentic AGI to perform tasks autonomously because it will be profitable to do so.
A singleton⁴ AGI is deployed by a well intentioned actor but is misaligned⁵ and gains a decisive strategic advantage. Such an AI is capable of outmaneuvering humanity and can reach its misaligned aims either by directly acting in the physical world or influencing intermediaries like humans to act in ways that harm humans or humanity.
Humanity is disempowered. We don’t know exactly what will happen but human extinction seems likely.

Carlsmith arrives at a 5% chance of human extinction following a similar path. You can put your own probabilities into a detailed model here⁶ to derive your probability of existential catastrophes according to this model.

What is a path to AGI being an existential risk?

In progress