What is an AI's "time horizon length"?

A 2025 paper from METR introduces “time horizon length” as a measure of AI capability. The paper evaluates AIs on a diverse set of software tasks, each with an associated “length”, which is the amount of time it would take a human expert to do the task. An AI’s “50% time horizon” is then defined as the length of tasks that the AI can complete half the time, averaged across all tasks of that length.1 For example, if an AI has a "50% chance of success" to complete a task that would take a human expert an hour, then it has a “50% time horizon” of one hour.

The authors benchmarked 13 frontier AI models released between 2019 and 2025 and found that their time horizons have been doubling approximately every 7 months:

The latest models, like Claude 3.7 Sonnet, can complete 50-minute tasks with a 50% success rate. A given model's success rate decreases by a roughly constant amount for each doubling of the task's horizon length (see the following chart):

Caption: Kwa et al., 2025

Better reasoning abilities, tool use, and adaptability to mistakes have led to performance improvements. However, models still struggle with "messier" real-world scenarios that lack clear feedback loops. As the figure shows, for current models, success rates drop sharply for horizon lengths around an hour for nearly all tasks.

If the trend of horizon length doubling every 7 months holds — which, as METR cautions, it may not — AI systems could automate one-month-long software engineering tasks between 2028 and 2031.


  1. This is similar to what Richard Ngo refers to as t-AGI, and has been explored in other prior work, such as Ajeya Cotra’s Bio Anchors report. ↩︎



AISafety.info

We’re a global team of specialists and volunteers from various backgrounds who want to ensure that the effects of future AI are beneficial rather than catastrophic.

© AISafety.info, 2022—1970

Aisafety.info is an Ashgro Inc Project. Ashgro Inc (EIN: 88-4232889) is a 501(c)(3) Public Charity incorporated in Delaware.