Intro to AI safety

In recent years, AI has exceeded people’s expectations in a wide variety of domains — including playing Go, composing human-like text, writing code, and modeling protein folding. It may not be long until we create AI systems that are much more capable than humans at solving most cognitive problems. Advanced AI could provide great benefits, but it could also cause unprecedented disasters, and even human extinction, unless difficult technical safety problems are solved and humanity cooperates to deploy advanced AI wisely.

Rapid progress in the capabilities of AI systems has recently pushed the topic of existential risk from AI into the mainstream. The abilities of systems like GPT-4 used to seem out of reach in the foreseeable future. The leading AI labs today are aiming to create “artificial general intelligence” in the not-too-distant future, and many top researchers are warning about its dangers.

Current AI vastly outperforms most humans at board games and natural language processing. There are good reasons to expect that future AI could outperform us, perhaps vastly so, in other areas, like science, technology, economic competition, and strategy. When AI becomes capable of replacing humans for most of the work involved in AI research, this will accelerate such research, potentially resulting in a “superintelligence” in a short time.

Superintelligent AI systems could greatly improve everyone’s lives if their actions are in line with human values. But it’s not guaranteed that they will be. A central concern of AI safety is making sure that AI systems try to do what we want, and that they keep doing so even if their circumstances change fundamentally – for example, if their cognitive capabilities exceed those of humans. The problem of ensuring that AI systems pursue the goals we want them to is called the “AI alignment problem”, and it’s widely regarded as unsolved and difficult.

AI alignment researchers haven't figured out how to take an objective and ensure that a powerful AI will reliably pursue that exact objective. The way the most capable systems are trained today makes it hard to understand how they even work. The research community has been working on these problems, trying to invent techniques and concepts for building safe systems.

It’s unclear whether these problems can be solved before a misaligned system causes an irreversible catastrophe. However, success becomes more likely if more people make well-informed efforts to help. We made this site to help people understand the challenges at hand and the solutions being worked on. The related questions below are a good place to start learning more, or you can enter your questions into the search bar.