The Race to Human-Level Computer Use: How AI Models Are Closing the Gap

Recent benchmarks show AI's ability to operate computers autonomously has jumped from 8% to 61% in just one year, approaching the human-level performance of 72%.

Contents

Recent Progress
What This Means

One of AI's most exciting frontiers is "computer use" — teaching systems to navigate software and complete digital tasks like humans do. OSWorld benchmarks reveal these capabilities are improving at an extraordinary rate, bringing us closer to AI that can truly operate our digital tools independently.

Recent Progress

According to Behnam Neyshabur and the Anthropic team, model performance jumped from around 8% in October 2024 to approximately 61% by October 2025, while human-level performance sits at 72.36%. Claude 3.5 Sonnet started at roughly 20% in early 2025, Claude 3.7 and OpenAI's preview pushed toward 30%, UI-TARS and Claude 4 crossed 40% by mid-2025, and Claude 4.5 Sonnet recently hit 61% — the closest to human ability yet.

What This Means

The data shows exponential growth, with performance roughly doubling within months. If this continues, AI could match human-level performance in practical computer tasks within a year or two. This has major implications for automation, scientific research, and business productivity, enabling AI to run experiments, operate software, and complete complex digital workflows independently.

Despite impressive progress, challenges remain in reliability and nuanced decision-making. However, the rapid pace suggests these limitations may be temporary. Computer-use agents are transitioning from experiments to practical tools that could soon become as essential as search engines, and organizations preparing now will be best positioned to leverage these capabilities.

#AI News #@bneyshabur

Peter Smith E-mail

Peter Smith - web3.0 projects expert and writer exploring the intersection of blockchain, AI, and online entertainment.