What is the Center for AI Safety (CAIS)'s research agenda?

3 min read

Suggest changes in Google Docs

The Center for AI Safety (CAIS)¹ is a San Francisco-based non-profit directed by Dan Hendrycks that "focuses on mitigating high-consequence, societal-scale risks posed by AI". They pursue both technical and conceptual research alongside work on expanding and supporting the field of AI safety.

Their technical research focuses on improving the safety of existing AI systems, and often involves building benchmarks and testing models against those benchmarks. It includes work on:

Robustness, for example their analysis of distribution shift, their evaluation of LLM rule-following, and their proposed data processing method for improving robustness.
Transparency, where they have presented representation engineering (RepE) as an emerging approach to transparency.
Machine ethics, where their most well-known work includes the ETHICS dataset and MACHIAVELLI benchmark for evaluating language models.
Anomaly detection, where they have worked on establishing a baseline for detection of out-of-distribution examples, and have proposed outlier exposure (OE), a method of detecting anomalies by training a detector based on a dataset of anomalies.

Their conceptual research has included:

Their field-building projects include:

The May 2023 Statement on AI Risk – a statement signed by many AI scientists and other notable figures
The CAIS Compute Cluster, which offers compute for AI safety research
The CAIS Philosophy Fellowship
Prize incentives for safety-relevant research such as improving ML safety benchmarks, moral uncertainty detection by ML systems, and forecasting by ML systems
An ML Safety course and scholarships for ML students doing safety-related research

Not to be confused with Comprehensive AI Services, a conceptual model of artificial general intelligence proposed by Eric Drexler, also abbreviated CAIS. ↩︎

What projects is CAIS working on?