How can progress in non-agentic LLMs lead to capable AI agents?

AutoGPT is an example of an agent built on top of GPT-3.

One threat model which includes a GPT-style AI is “misaligned model-based [reinforcement learning] agent”. It suggests that a reinforcement learner attached to a GPT-style world model could lead to an existential risk, with the reinforcement learning agent being the optimizer which uses the world model to be much more effective at achieving its goals.

A more speculative possibility is that a sufficiently powerful world model may develop a mesa-optimizer which could achieve its objectives in the world via the outputs of the model, perhaps by causing an optimizer to be created with goals aligned to the mesa-optimizer.



AISafety.info

We’re a global team of specialists and volunteers from various backgrounds who want to ensure that the effects of future AI are beneficial rather than catastrophic.