Any AI will be a computer program. Why wouldn't it just do what it's programmed to do?

3 min read

Suggest changes in Google Docs

While it is true that a computer program will always do what it is programmed to do, it is difficult to ensure that this is the same as what you intended it to do. One way to think about it is to consider two main problems.

Unlike traditional computer programs, state-of-the-art models today are often based on deep learning, which means that the program learns to perform tasks without being programmed with specific rules. Given this complexity, it is hard to know what an AI will do, and it is difficult to tell how the systems make their decisions. There are even reasons to think it might be hostile by default.

Additionally, even if we were able to ensure an AI would do exactly what it was tasked to do, human communication is based on a lot of shared context, and it is difficult to precisely specify what one intends. (cf. Goodhart’s law).

Imagine you are an industrialist who owns a paperclip factory, and imagine you've just received a superintelligent AGI to work for you. You instruct the AGI to "produce as many paperclips as possible". If you've given the AGI no further instructions, the AGI will immediately acquire several instrumental goals.

It will want to prevent you from turning it off (If you turn off the AI, this will reduce the amount of paperclips it can produce)
It will want to acquire as much power and resources for itself as possible (because the more resources it has access to, the more paperclips it can produce)
It will eventually want to turn the entire universe into paperclips, including you and all other humans, as this is the state of the world that maximizes the amount of paper clips produced.

These consequences might be seen as undesirable by the industrialist, as the only reason the industrialist wanted paperclips in the first place, presumably, was so he/she could sell them and make money. However, the AGI only did exactly what it was told to. The issue was that what the AGI was instructed to do led to it doing things the industrialist did not anticipate (and did not want).

Some good videos that explore this issue more in depth:

Any AI will be a computer program. Why wouldn't it just do what it's programmed to do?

In progress