One of AI's most exciting frontiers is "computer use" — teaching systems to navigate software and complete digital tasks like humans do. OSWorld benchmarks reveal these capabilities are improving at an extraordinary rate, bringing us closer to AI that can truly operate our digital tools independently.
Recent Progress
According to Behnam Neyshabur and the Anthropic team, model performance jumped from around 8% in October 2024 to approximately 61% by October 2025, while human-level performance sits at 72.36%. Claude 3.5 Sonnet started at roughly 20% in early 2025, Claude 3.7 and OpenAI's preview pushed toward 30%, UI-TARS and Claude 4 crossed 40% by mid-2025, and Claude 4.5 Sonnet recently hit 61% — the closest to human ability yet.

What This Means
The data shows exponential growth, with performance roughly doubling within months. If this continues, AI could match human-level performance in practical computer tasks within a year or two. This has major implications for automation, scientific research, and business productivity, enabling AI to run experiments, operate software, and complete complex digital workflows independently.
Despite impressive progress, challenges remain in reliability and nuanced decision-making. However, the rapid pace suggests these limitations may be temporary. Computer-use agents are transitioning from experiments to practical tools that could soon become as essential as search engines, and organizations preparing now will be best positioned to leverage these capabilities.