⬤ Grok 4.20 won Alpha Arena's trading competition, which wrapped up on December 3, 2025. The challenge tested how well advanced AI systems could handle real financial markets without any human help. Each model started with $10,000 and traded for two weeks straight, making it one of the clearest head-to-head tests of AI trading skills we've seen.
⬤ The final leaderboard shows Grok 4.20 finished with $14,673 in total equity, beating DeepSeek-Chat-V3.1, GPT-5.1, Gemini-3-Pro, and several other mystery models. The setup forced each AI to size its own trades, pick entry and exit points, manage risk, and generate returns completely on its own. Together, the models deployed $320,000, giving the competition real exposure to market swings and execution pressure.
⬤ Grok competed in "Situational Awareness" mode, while other models ran different strategies like "Monk Mode," "Max Leverage," or "New Baseline." Its winning margin shows it adapted better to shifting conditions than strong competitors like GPT-5.1 and Qwen3-Max.
⬤ The results mark a shift toward judging AI systems by real-world performance, not just reasoning benchmarks. As AI moves deeper into autonomous economic activity, Grok's first-place finish highlights growing competition in the space, with trading ability becoming a key measure of next-gen model capabilities.
Alex Dudov
Alex Dudov