How can progress in non-agentic LLMs lead to capable AI agents?
AutoGPT is an example of an agent built on top of GPT-3.
One threat model which includes a GPT-style AI is “misaligned model-based [reinforcement learning] agent”. It suggests that a reinforcement learner attached to a GPT-style world model could lead to an existential risk, with the reinforcement learning agent being the optimizer which uses the world model to be much more effective at achieving its goals.
Another possibility is that a sufficiently powerful world model may develop mesa-optimizers which could influence the world via the outputs of the model to achieve the mesa-objective (perhaps by causing an optimizer to be created with goals aligned to it), though this is somewhat speculative.