Claude 4.5 Opus Thinking Scores 1960 Rating, Surpassing Earlier Models by Wide Margin

New benchmark ratings reveal Anthropic's Claude 4.5 Opus Thinking model achieving a 1960 rating—implying over 99% theoretical win-rate versus Claude 3 Opus—as AI model evaluation systems move toward standardized comparison frameworks.

⬤ Fresh rating data circulating in AI research circles shows a massive performance jump between Anthropic's newest Claude 4.5 Opus Thinking and earlier versions. According to estimates shared by @scaling01 using Glicko-2 or Elo-style calculations, the latest model would theoretically win more than 99% of head-to-head matchups against Claude 3 Opus. The data's also tied to something called the "Lisan Index"—basically an attempt to create one unified ranking system for AI models.

⬤ Tables accompanying the post place Claude Opus 4.5 Thinking at the top with a 1960 rating, followed by Claude Opus 4.5 at 1842 and Claude Sonnet 4.5 Thinking at 1836. Older releases like Claude 3.5 Sonnet, Claude 3 Opus, and the October 2024 Sonnet variant sit noticeably lower. A separate chart tracking these conservative scores over time shows steady upward momentum as new Claude versions rolled out through 2024, 2025, and into early 2026.

⬤ The post suggests we're watching a "benchmark monopoly" slowly form—standardized evaluation systems gaining traction across the industry. It touches on how competing models stack up in this landscape while highlighting the clear capability leap from Claude 3-series to 4.5-series, especially the "Thinking" variants built for handling tougher reasoning tasks.

Claude Opus 4.5 Hits 80.9% on Coding Benchmarks While Cutting Costs by Two-Thirds

Anthropic's latest flagship model outperforms competitors on SWE-bench and delivers major efficiency gains with one-third the token costs.

⬤ What makes this interesting? It's hard evidence of how fast Claude's gotten better across successive releases. With newer models posting dramatically higher ratings, these benchmarks aren't just numbers—they're shaping the whole conversation around AI competitiveness, technical maturity, and whether the industry needs unified evaluation standards.

News Source

#AI #AI News #Claude Opus 4.5

Sergey Diakov E-mail

Sergey Diakov - economist and market analyst with a focus on U.S. equities, global economics, and the impact of AI on financial markets.