Can fictional stories teach us about AI misalignment?

3 min read

There are many fictional stories that depict unaligned non-human entities that can be relevant to understanding AI misalignment. Examples include:

Some stories (King Midas, The Sorcerer’s apprentice) point to the complexity and fragility of human values, and a genie-like superintelligence that does not understand what we really want (or tries to actively subvert it) could lead to catastrophic consequences. Some have argued that LLMs appearing to understand humans makes this less likely, but these stories are still relevant to illustrate the necessity to get these questions right.

Some stories (M:I, HAL9000, Ex Machina) have AIs that have been tasked with beneficial or morally neutral aims and who are misaligned in important ways from their creators, which is a realistic threat.

Some stories (M:I, The Terminator) have an AI that very explicitly attempts to take over the world, whereas others (Ex Machina, Upgrade, HAL 9000) have more restricted or ambiguous aims that still lead to the downfall of their creators or users. In the former case, world takeover is not the final goal and can be seen as an instrumentally convergent goal.

Asimov’s characters attempt to codify the behavior of robots with three laws, but the stories illustrate that these laws are insufficient to guarantee a good outcome.

The Terminator movies are often used as a pop-culture reference for AI risk. Many in the field dislike this analogy since this suggests an “evil” AI as well as android killer robots, which are unlikely. Matt Yglesias pushes back and argues that there are some relevant aspects in these movie, including:

Skynet (the AI that goes rogue) correctly understands that humans are a threat to its goals and attempts extermination as an instrumental goal
Competitive dynamics make the rise of Skynet inevitable, mirroring the AI arms race
Skynet uses its knowledge of global politics to initiate a nuclear exchange

It’s worth noting that we should avoid generalizing from fictional evidence and take these comparisons with a grain of salt. Some of these scenarios are unlikely, and there are possible misalignment scenarios that could be very dangerous but would not make for interesting media and thus are not as covered in such stories.

Are there any detailed example stories of what unaligned AGI would look like?