⬤ Google has reached a landmark result - Gemini 3 Flash now equals the scores of GPT-5.2 and Claude 4.5 Sonnet on Vending-Bench 2, a test that measures how well an AI model plans plus manages money over many turns. Gemini 3 Flash placed at the top with the two best known rivals.
⬤ During each simulation the benchmark records the remaining cash balance. Gemini 3 Flash tracked GPT-5.2 and Claude 4.5 Sonnet point for point. Older systems - Gemini 2.5 Pro, GPT-5.1, Grok-4.1 Fast - ended with clearly lower balances, a sign they lose steam in long tasks.
⬤ Vending-Bench 2 values steady gains more than sudden jumps. Gemini 3 Flash did not merely reach the leaders - it left its own predecessor but also other models well behind. Its balance rose in an even line, evidence of dependable output instead of occasional lucky bursts.
⬤ For Google as well as for the wider market the result carries weight. Equality with the top two models lets Gemini 3 Flash bid for large scale business jobs and for tasks that stretch over many steps. Because buyers now look to benchmarks to decide which models are ready for production, this outcome may steer contracts or reshape how the field ranks its leaders.
Saad Ullah
Saad Ullah