What is Regressional Goodhart?

Regressional Goodhart is a form of Goodhart’s law which occurs whenever a proxy variable is not perfectly correlated with what we want.

Imagine you are in physical education class and have to pick a player for your basketball team. You don’t know how good your classmates are at basketball, but you do know that tall people tend to be better at basketball in general. So you pick the tallest person you see.

This is a good strategy, but the tallest person is still probably not the best player. Why not? There are various factors which determine someone’s basketball skill - height, how much they practiced, talent, … If we pick the tallest person, we are also implicitly selecting for the difference between the height and basketball skills. That means it is more likely to be a person who is taller than their basketball skill would suggest than if we pick someone randomly.

This phenomenon is called Regressional Goodhart: when we can’t measure what we want directly and need to use a proxy, we get a suboptimal outcome.

The Regressional Goodhart effect is illustrated in the following image: If the proxy is not perfectly correlated with what we want, the best point by proxy is predictably worse than the best point:

(image source)

Regressional Goodhart is called that way because it is a failure of regression: we are estimating basketball skill based on height, which is a (in this case linear) regression. However, the regression is biased because we implicitly selected for the difference between height and basketball skill.

Regressional Goodhart is a form of the more general phenomenon of Goodhart’s law which states that

As soon as a measure becomes a target, it ceases to be a good measure.

In the case of Regressional Goodhart, the proxy ceases to be a good measure because of an imperfect correlation between the proxy and what we want.

However, there are other ways in which the proxy starts becoming a bad measure when it is optimized: Other types of Goodhart are Causal Goodhart, Extremal Goodhart and Adversarial Goodhart.

Relation to Extremal Goodhart

See also: What is Extremal Goodhart?

Extremal Goodhart is particularly similar to Regressional Goodhart: Extremal Goodhart becomes a problem if driving a proxy to extreme values breaks the correlation between the proxy and what we want. For example, if you choose the tallest person in history for your basketball team, then you end up with someone who has difficulty walking due to a pituitary disorder. Regressional Goodhart occurs whenever the correlation is not perfect. In contrast, Extremal Goodhart only occurs when there are outliers with a high proxy value which are nonetheless not what we want, such as depicted in this image:

Further Reading

You can read more in the Goodhart Taxonomy by Scott Garrabrant, which introduces these four types of Goodhart.