Is the worry that AI will become malevolent or conscious?

4 min read

Suggest changes in Google Docs

Concern about existential risk from misaligned AI is not based on AI systems becoming conscious or developing motivations like revenge or hatred. Rather, it’s based on them pursuing their goals competently, but with indifference to human well-being or morality.

Fiction sometimes depicts AI as “waking up” to sentience, going from being an obedient tool to being a conscious mind, maybe one that now hates its creators. Scenarios that AI safety researchers worry about do have some vague similarities to this picture:

AI systems might become much more intelligent — in the sense of solving harder problems and making more effective decisions — in a short time. Currently, it wouldn’t make sense for such a leap to happen within a single system: an LLM like GPT-5.1 is a fixed structure that gets applied to many different inputs, but doesn’t learn from those inputs after its initial training. But a newly-trained system could turn out to be much smarter than its predecessor. And maybe in the future, we’ll change how we build and use AI, so a leap in intelligence could happen within one system, or maybe the line between AIs learning and AIs helping create their successors will blur.
Similarly, AI systems might become more “agentic,” in the sense of autonomously pursuing world outcomes instead of passively answering human queries. (As of 2025, there’s already a strong push to build more autonomous AI agents.)
AI systems might increase in situational awareness or self-awareness, learning more about what kind of thing they are, who their creators are, how their outputs affect the world, how they’re trained, how they work, where they’re used, and so on. (Situational awareness is also increasing as of 2025.)
In a “treacherous turn,” an AI that had goals incompatible with ours, but that was acting nice for strategic reasons (“scheming”), could see an opportunity to seize power and suddenly “take the mask off.”

But consciousness, in the sense of subjective experience, isn’t in itself dangerous. It may be morally important in its own right, but it’s not what causes AI to make plans that harm us.

Nor is there a clear connection between consciousness and the risky AI dynamics listed above. For example, if you turn a non-agent into an agent by putting it in a loop and repeatedly asking it to find and implement the most effective decision, it’s not clear how that would give it subjective consciousness. And situational awareness seems like it’s just a kind of knowledge a system can learn, separate from any leap in subjective consciousness.

And although safety researchers sometimes worry about such changes in AI happening quickly, there’s no reason to expect them to all be packaged together with consciousness in a single existential revelation.

Just like consciousness is less central to the risk than competence, malevolence is less central than indifference. The worry is not that AI will value our downfall for its own sake, or because it feels hate for us. The worry is that it will end up with some set of goals mostly unrelated to humans and their values, and harm us for instrumental reasons when we get in the way. If AI kills us, it will probably be for reasons that are “not personal” and “just business,” in the same way that many human actions result in the mass death of insects we don’t have strong feelings about.

Are AIs conscious?