Claude Opus 4.6 Scores 78.3% on MRCR v2 With 1M Token Context Window

Claude Opus 4.6 tops the MRCR v2 long-context benchmark with a 78.3% match ratio at 1M tokens, a massive leap from the previous 18.5%.

⬤ Claude Opus 4.6 now supports a 1 million token context window, letting it process entire codebases, long document sets, or extended multi-session workflows in a single request. No chunking, no summarization workarounds, just one very large prompt. The competitive context is shifting fast: Gemini 3.1 Pro jumped 13 points above Gemini 3 Pro on the ArenaAI leaderboard, signaling just how heated the long-context race has become.

⬤ On the MRCR v2 long-context retrieval benchmark, the model hit a 78.3% mean match ratio at 1M tokens. The test is unforgiving: it hides specific facts across a million tokens and demands the exact correct answer, not a close one. That puts Claude Opus 4.6 at the top of the frontier model rankings at this context scale.

⬤ The jump from earlier versions is stark. Previous iterations of the same system scored around 18.5% on MRCR v2, illustrating just how hard long-context retrieval actually is. The comparison chart also shows Claude Sonnet 4.6, GPT-5.4, and Gemini 3.1 Pro all trailing off as input size grows. Not every rival is keeping pace on accuracy either: Gemini 3 Pro shows lower error rates but still trails Claude Sonnet 4.5 at 15%.

⬤ The broader industry is moving fast on multiple fronts at once. Context length, retrieval accuracy, and reasoning depth are all advancing in parallel. A clear example: OpenAI rolled out GPT-5.4 Thinking with 92.8% on GPQA Diamond, raising the bar across the board and pushing every lab to respond.

News Source

#AI #Claude Opus 4.6 #MRCR v2

Peter Smith E-mail

Peter Smith - web3.0 projects expert and writer exploring the intersection of blockchain, AI, and online entertainment.