⬤ xAI's Grok 4.1 Fast just topped the τ²-Bench Telecom rankings with a 93% accuracy score in agentic tool-use evaluations. The model tied with Kimi K2 at the summit while maintaining a clear edge over other leading AI systems. The timing matches up with Musk's latest comments claiming Grok 4 now exceeds PhD-level intelligence across every subject, adding fuel to expectations around xAI's development roadmap.
⬤ The benchmark breakdown shows Grok 4.1 Fast outpacing Claude Opus 4.5 by 3 points (90%), beating Gemini 3 Pro Preview and GPT-5 Codex by 6 points each (both at 87%), and leading Claude 4.5 Sonnet by 15 points. Models further down the chart, including Claude 4.5 Sonnet, Kimi K2 0905, and Grok 4, landed between the mid-70% and low-80% range. Beyond raw performance, Grok 4.1 Fast runs up to 50× cheaper than Claude Opus 4.5 and 24× cheaper than Gemini 3 Pro, making it a cost-efficient option for large-scale deployment.
⬤ Alongside the benchmark news, Musk doubled down on his vision for rapid capability growth within the Grok family. He described Grok 4 as "better than PhD level in every subject" with no exceptions, though he admitted it still lacks certain common-sense reasoning. While Grok 4 hasn't yet invented new technologies or discovered new physics, Musk called this "just a matter of time," predicting breakthroughs could surface later this year and expressing confidence they'd arrive by next year at the latest.
⬤ The combination of strong benchmark results and bold predictions points to the quickening pace of agentic AI development. Grok 4.1 Fast's telecom tool-use performance, paired with Musk's expectations for Grok 4's scientific potential, highlights the expanding influence advanced AI systems may have on competitive landscapes, innovation cycles, and future technological progress.
Eseandre Mordi
Eseandre Mordi