⬤ According to METR data, high-end models like Claude Opus 4.6 are expanding their task performance horizons at a pace that's outrunning earlier estimates. What used to be a roughly seven-month doubling time for capability has now compressed to around four months. The chart tracks the 50% success time horizon - meaning how long a task can be before a model completes it successfully at least half the time - and the growth between 2023 and 2026 across leading systems is hard to ignore.
⬤ Early models like GPT-4o and Sonnet 3.7 clustered around the 15-to-30-minute task range. Newer generations pushed that into multiple hours. Today, Claude Opus 4.6 and top GPT-5.2 configurations are sitting at task lengths exceeding four hours - and trending higher. The overall trendline carries an R-squared of 0.93, which is a strong fit, suggesting the acceleration isn't noise. Each new model generation is handling longer, more complex tasks faster than the one before it.
What was once a roughly seven-month doubling time for capability has shortened toward approximately a four-month doubling pace.
⬤ This trend holds up in external rankings too. Claude Sonnet 4.6 ranked second in AI Index scoring with 51 points, putting Anthropic near the top of the overall standings. On the coding side, Claude Opus 4.6 led SWE-bench with a score of 517, showing that raw reasoning gains are translating into real-world software development performance. These aren't just leaderboard numbers - they reflect systems that are genuinely better at doing difficult, multi-step work.
⬤ The practical implications are already showing up in production. Claude Opus 4.6 recently generated a 10,000-line video editing application - the kind of long-sequence output that would have been far outside reach just a year ago. As models handle increasingly complex tasks in shorter development cycles, enterprises and developers are likely to speed up deployment. The competitive pressure across the generative AI landscape isn't easing - if anything, architectural innovation is pushing it faster.
Marina Lyubimova
Marina Lyubimova