What is Extremal Goodhart?

Extremal goodhart is a variant of Goodhart's law where optimization causes proxies to break down because things are different at the extremes.

Imagine you are trying to find a player for your basketball team. You know that basketball skill is correlated with height, so you go find the tallest person in the world to join your team. However, it turns out the tallest person in history had a pituitary disorder and had difficulty walking, so finding the tallest person on earth is a terrible strategy for finding great basketball players!

This failure mode is called Extremal Goodhart: Sometimes we can’t easily measure what we want (basketball skill) directly and need to use a proxy (height). But for extreme values (tallest person on earth), the proxy is not related to the target anymore.

(image source)

Extremal Goodhart is important for AI alignment because we may end up building AIs which act as powerful optimizers. If the optimization target is not perfectly aligned with what we want, then powerful optimization may lead to extreme worlds in which the imperfection of the optimization target becomes important.

For example, someone might ask an AI to maximize the balance on their bank account, and thinks the AI will do things like start a business for content creation or algorithmic trading. However, if the AI is sufficiently capable, it might instead hack the bank and set the bank account balance to an extremely high number, and the user gets arrested. In this scenario in which the proxy (account balance) is extremely high, the proxy is not related anymore to what the user wants (ability to spend money).

You can read more in the Goodhart Taxonomy by Scott Garrabrant, which introduces these four types of Goodhart.