Gemini 3.1 Pro is getting some heat. Throughput on Google Vertex has slipped to 50 tokens per second, down from roughly 92 TPS when the model first launched. That's a drop of nearly 46%. A provider dashboard shared in the post shows Google AI Studio running even lower, at 46 TPS. The numbers tell a clear story: something on the infrastructure side isn't keeping up with demand.
Simple Tasks Taking Longer as Vertex Throughput Slides to 50 TPS
Users aren't just seeing numbers change on a dashboard. They're feeling it. Simple tasks are taking noticeably longer, and the overall responsiveness has become harder to predict. The core issue here isn't model quality. Most people running Gemini 3.1 Pro agree the intelligence is still sharp. What's breaking down is the infrastructure behind it, specifically how fast and consistently the model can deliver output at scale.
This isn't an entirely unfamiliar situation. Gemini 3.1 Pro jumped 13 points above Gemini 3 Pro on the ArenaAI leaderboard, making its initial release one of the more anticipated in recent months. But the post notes that similar performance fluctuations showed up during earlier Google AI rollouts too. A rough launch isn't new territory for Google.
Infrastructure Strain Raises Questions About Enterprise Readiness
The report doesn't frame this as a permanent downgrade. Nobody is saying the model got worse overnight. What it does point to is a backend under strain, one that wasn't fully prepared for the traffic spike that followed launch. Google's Gemini 3.1 Pro launched with a 771 ARC-AGI-2 benchmark score, which generated a lot of interest. That kind of attention puts real pressure on serving infrastructure fast.
For enterprise users in particular, throughput isn't a secondary metric. When you're running production workloads, a 46% drop in TPS has a direct impact on latency, cost efficiency, and user experience. The author behind the original post remains optimistic about the long-term picture, and there's no suggestion of pricing or context limit changes. Still, Google's broader Gemini 3 infrastructure updates are clearly still being worked through. The current slowdown is a reminder that launching a capable model is only half the job. Keeping it fast and stable under real-world load is the other half.
Usman Salis
Usman Salis