Why can’t we just “put the AI in a box” so that it can’t influence the outside world?

2 min read

Suggest changes in Google Docs

One strategy to ensure the safety of a powerful AI is to keep it contained in a secure software environment, or “box” it. Keeping an AI system in a box would make it safer than letting it roam free. However, even boxed AI systems might not be safe enough.

Reliably boxing intelligent agents is hard, as illustrated by humans who escape from jail or control drug empires while incarcerated. In particular, writing secure software is hard. Even if "boxed" from interacting with the world through standard channels, a powerful AI system could potentially influence the world in exotic ways that we didn't expect, such as by learning how its hardware works and manipulating bits to send radio signals. It could fake a malfunction and attempt to manipulate the engineers who look at its code. As the saying goes: in order for someone to do something we had imagined was impossible requires only that they have a better imagination.

Experimentally, humans have convinced other humans to let them out of the box. Spooky.