Can we get AGI by scaling up architectures similar to current ones, or are we missing key insights?

2 min read

Suggest changes in Google Docs

It's an open question whether we can create AGI simply by increasing the amount of compute used by our current models ("scaling"), or if AGI would require fundamentally new model architectures or algorithmic insights.

Some researchers have formulated empirical scaling laws as an attempt to formalize the relationship between the compute requirements and the capabilities of AI models.

For a variety of opinions on this question, see:

Gwern on the scaling hypothesis.
Daniel Kokotajlo on what we could do with a trillion times as much compute as current models use.
Rohin Shah on the likelihood that scaling current techniques will produce AGI.
Rich Sutton's "The Bitter Lesson", which argues that more computation beats leveraging existing human knowledge.
Gary Marcus's "The New Science of Alt Intelligence", which argues that current deep learning systems are limited and scaling will not help.
AI Impacts' "Evidence against current methods leading to human level artificial intelligence".
Lesswrong user eggsyntax’s “LLM Generality is a Timeline Crux”.
Leopold Aschenbrenner on counting the OOMs.
Arvind Narayanan and Sayash Kapoor on AI Scaling Myths.
Dwarkesh Patel tries to represent both sides.