What is Goodhart's law?
Goodhart’s law states that when a measure becomes a target, it ceases to be a good measure. This happens all over the place:
- One way to measure the quality of an online article might be by counting how many people click on it. However, if click count determines how much authors are paid or how high articles are ranked in search results, authors will be incentivized to write in a way that maximizes clicks, perhaps by choosing sensational titles. When they do so, click count may stop correlating well with the quality of articles.
- When funding is allocated to school districts based on test scores, teachers are incentivized to "teach to the test," and the tests may stop being good measures of knowledge of the material.1There is a possibly fictional story of soviet nail factories which, when tasked to produce a high number of nails, produced many tiny, useless nails, and when tasked to produce a certain amount of nails by weight, produced fewer, giant nails.
- IBM used to pay its programmers per line of code produced. This incentivized them to write bloated programs and punished simplicity, ultimately reducing the quality of the programmers’ work.
Scott Garrabrant identifies four forms of Goodhart’s law:
- Regressional Goodhart — When selecting for a measure that is a proxy for your goal, you select not only for your goal, but also for the difference between the proxy and your goal. For example, being tall is correlated with being good at basketball, but if you exclusively pick exceptionally tall people to form a team, you end up selecting taller people who are worse players over slightly shorter people who are better players. This is an unavoidable problem when you only have noisy data, so you need to work around it, such as by using multiple independent proxies.
- Causal Goodhart — When there is a non-causal correlation between the proxy and the goal, intervening on the proxy may fail to affect the goal. For example, giving basketball players stilts because taller people are better at basketball (height is a proxy for basketball skill), or filling up your rain gauge to help your crops grow (since water in a rain gauge is a proxy for amount of rainfall).
- Extremal Goodhart — Situations in which the proxy takes an extreme value may be very different from the ordinary situations in which the correlation between the proxy and the goal was observed. For example, the very tallest people are also unhealthy because of that height, and therefore bad basketball players.
- Adversarial Goodhart — When you optimize for a proxy, you provide an incentive for adversaries to take actions which decorrelate the proxy from your goal to make their performance look better according to your proxy. For example, if good grades are used as a proxy for ability, it could incentivize cheating since grades are easier to fake than ability.
Goodhart’s law is a major problem for AI alignment
Making sure that AI tries to do what we want it to do.
An AI model that takes in some text and predicts how the text is most likely to continue.
Mesa-optimization can also be understood as an example of Goodhart’s law. Deceptive alignment
A case where the AI acts aligned while in training, but when deployed it turns out not to be aligned.
One attempt to help solve this problem is to use milder forms of optimization such as quantilization.
There is a possibly fictional story of soviet nail factories which, when tasked to produce a high number of nails, produced many tiny, useless nails, and when tasked to produce a certain amount of nails by weight, produced fewer, giant nails. ↩︎