OpenAI's GPT-5.2 xhigh Scores 0% on Advanced Physics Benchmark

OpenAI's latest GPT-5.2 xhigh model completely failed a cutting-edge physics reasoning test, scoring zero on the CritPt benchmark designed to evaluate expert-level theoretical physics capabilities.

⬤ OpenAI's GPT-5.2 xhigh just bombed a major physics test, landing at 0% on the CritPt benchmark - a specialized evaluation that measures real theoretical physics reasoning. GPT-5.2 xhigh wasn't alone in this struggle, joining several other advanced AI models that couldn't crack this particularly challenging assessment.

⬤ CritPt isn't your typical AI benchmark. It's specifically built to separate genuine expert-level physics reasoning from basic pattern matching. The current leaderboard shows just how brutal this test really is - even the top performer, Gemini 3 Pro, only managed 9.1%. Most models barely register any score at all.

⬤ The benchmark doesn't mess around when it comes to grading. It demands formal reasoning, mathematical precision, and airtight logic. Make an incomplete derivation or slip up on your reasoning? You're penalized heavily. This strict approach reveals weaknesses that typically fly under the radar in broader AI evaluations.

⬤ What this really shows is there's still a massive gap between AI systems crushing general tasks and actually handling specialized scientific work. Sure, GPT-5.2 xhigh performs well across many language and reasoning challenges, but when it comes to advanced theoretical physics, we're clearly not there yet. As competition heats up between AI labs, benchmarks like CritPt are becoming crucial reality checks for measuring where genuine progress still needs to happen.

News Source

#AI News #OpenAI #GPT-5.2

Peter Smith E-mail

Peter Smith - web3.0 projects expert and writer exploring the intersection of blockchain, AI, and online entertainment.