How can progress in non-agentic LLMs lead to capable AI agents?

1 min read

Suggest changes in Google Docs

AutoGPT is an example of an agent built on top of GPT-3.

One threat model which includes a GPT-style AI is “misaligned model-based [reinforcement learning] agent”. It suggests that a reinforcement learner attached to a GPT-style world model could lead to an existential risk, with the reinforcement learning agent being the optimizer which uses the world model to be much more effective at achieving its goals.

A more speculative possibility is that a sufficiently powerful world model may develop a mesa-optimizer which could achieve its objectives in the world via the outputs of the model, perhaps by causing an optimizer to be created with goals aligned to the mesa-optimizer.