Claude Opus 4.5 Hits 80.9% on Coding Benchmarks While Cutting Costs by Two-Thirds

Anthropic's latest flagship model outperforms competitors on SWE-bench and delivers major efficiency gains with one-third the token costs.

⬤ Anthropic just dropped Claude Opus 4.5, their newest flagship model, and the numbers look solid. The company says it's crushing previous versions in coding, tool use, and reasoning while costing about one-third per token compared to what came before. That's a pretty significant jump in both performance and affordability for a high-end AI model.

⬤ The benchmark results tell an interesting story. On SWE-bench Verified—which tests how well AI handles real coding tasks—Opus 4.5 scored 80.9%, beating out Sonnet 4.5 (77.2%), Opus 4.1 (74.5%), and Gemini 3 Pro (76.2%). When it comes to terminal coding via Terminal-bench 2.0, Opus 4.5 hit 59.3%, leaving Sonnet 4.5 at 50.0% and Opus 4.1 at 46.5% in the dust, while staying competitive with Gemini 3 Pro.

⬤ Tool use and reasoning metrics show similar strength. Opus 4.5 nailed 88.9% on retail scenarios and 98.2% on telecom scenarios in t2-bench testing. For novel problem solving (ARC-AGI-2 Verified), it reached 37.6%—ahead of both Sonnet 4.5 and Gemini 3 Pro. Graduate-level reasoning on GPQA Diamond came in at 87.0%, which trails Gemini 3 Pro's 91.9% but still holds its own against other top models.

Anthropic's Opus 4.5 Hits 95% on Tough AI Research Benchmark

Anthropic's Opus 4.5 model achieved 95 % on CORE-Bench Hard. This test is hard. It checks if an AI can run a full real world scientific study without help.

⬤ Beyond raw performance, Anthropic added some practical features: adjustable "effort" settings for deeper thinking, auto-summarization for lengthy conversations, and better long-context handling. Combined with those lower token costs, Opus 4.5 reflects where the industry's heading—models that deliver strong results without breaking the bank, reshaping how companies think about deploying AI at scale.

News Source

#AI #AI News #Claude Opus 4.5

Saad Ullah E-mail Twitter Facebook

Saad Ullah - engineer and writer passionate about AI, blockchain, and the disruptive technologies driving fintech innovation.