Poetiq Scores 54% on ARC-AGI-2, Beats Alphabet's Gemini 3 Deep Think

Poetiq sets a new benchmark record on ARC-AGI-2 with a verified 54% score, surpassing Alphabet's Gemini 3 Deep Think. The system uses an open-source scaffold combining Gemini 3 Pro and GPT-5.1.

⬤ Poetiq just became the top performer on the ARC-AGI-2 benchmark, with results verified by the ARC Prize organization. The system first hit 61% on the public dataset, then scored 54% on the tougher semi-private test set. That 54% still puts it ahead of every other competing model, marking a significant step forward in AI reasoning capabilities.

⬤ The semi-private ARC-AGI-2 leaderboard shows Poetiq at the top, outperforming Alphabet's Gemini 3 Deep Think (Preview), which scored in the 40–50% range. The gap between Poetiq's verified 54% and Gemini's result demonstrates how optimized model orchestration is pushing performance higher across the board.

⬤ Poetiq achieved this breakthrough using a custom scaffold that combines Gemini 3 Pro and GPT-5.1. The scaffold is now available as open source, with a detailed breakdown coming later this week. This hybrid approach shows that strategic model combinations and structured reasoning workflows can unlock advanced capabilities beyond what single models deliver.

⬤ Poetiq's lead on ARC-AGI-2 intensifies competition in advanced AI development. Performance on high-level reasoning benchmarks influences market expectations, R&D spending, and strategic decisions among major players like Alphabet. With open-source scaffolds like Poetiq's gaining traction, the race to turn benchmark wins into real-world applications is heating up across the AI sector.

News Source

#AI #AI News #Gemini News #Alphabet #Poetiq #ARC-AGI-2 #Deep Think #Gemini 3

Eseandre Mordi E-mail

Eseandre Mordi - writer covering crypto, blockchain, and AI with a global perspective and a strong voice for women in tech.