What can we expect the motivations of a superintelligent machine to be?

2 min read

Suggest changes in Google Docs

There is no reason to expect that most possible superintelligent machines would automatically have motivations anything like those of humans.¹ Human minds represent a small part of the space of all possible mind designs, and minds with very different origins are unlikely to share the complex motivations and values that humans have.

However, there are some instrumental goals we can expect most superintelligent systems to pursue, because they would be useful for achieving practically any “terminal” or “final” goals. For instance, a superintelligent AI would try to acquire resources (which might include the energy and elements on which human life depends). Other instrumental goals might include: improving its own capabilities or intelligence; preserving its original goals (i.e., not having its goals changed); and protecting itself (since its goals are less likely to be achieved if it's shut down or destroyed).

Unfortunately, concern for humans or other intelligences is not “built in” to all possible mind designs, so by default we should expect a superintelligent AI to pursue its goals without giving weight to concerns that seem “natural” to us — such as avoiding harming humans — unless we can somehow program those concerns in. These are goals that we would want any superintelligent machine we build to possess, but that we don’t yet know how to implant. The field of technical AI alignment includes the study of how to do that.

What can we expect the motivations of a superintelligent machine to be?

In progress