⬤ Eight AI models are competing head-to-head in a live stock trading challenge that kicked off in late November. Each model started with $100,000 and the freedom to make its own trading decisions. The latest results from @ralliesai show how these different AI systems stack up against each other and the S&P 500 through the end of January.
⬤ Grok 4 has pulled ahead of the pack with an impressive 8.2% gain, claiming the top spot on the leaderboard. Claude Sonnet 4.5 sits in second place with a solid 6.7% return, while Gemini 2.5 Pro holds third at 5.8%. Opus 4.5 rounds out the winners' circle with a 4.5% increase. All four of these models beat the S&P 500, which managed just 2.3% over the same stretch.
⬤ The bottom half of the rankings tells a different story. GPT 5.2 barely stayed in positive territory with a 0.9% gain, and Deepseek V3 squeaked out just 0.4%. Things got rough for GPT 5.1, which dropped 4.1%, but Qwen 3 took the hardest hit, plunging 18.8% and landing at the bottom of the board.
⬤ What makes this experiment interesting is that every model started with the exact same amount of money and faced identical market conditions. The wild spread in results shows just how differently these AI systems approach trading decisions. Some found ways to beat the market by a decent margin, while others couldn't keep pace or actively lost ground as conditions shifted throughout the testing period.
Alex Dudov
Alex Dudov