Gemini 3.1 Pro Throughput Drops 46% to 50 TPS on Google Vertex

Google's Gemini 3.1 Pro is reportedly experiencing a sharp throughput decline on Google Vertex, falling to 50 TPS from 92 TPS at launch. Users are reporting slower response times and inconsistent performance across deployments.

Contents

Simple Tasks Taking Longer as Vertex Throughput Slides to 50 TPS
Infrastructure Strain Raises Questions About Enterprise Readiness

Gemini 3.1 Pro is getting some heat. Throughput on Google Vertex has slipped to 50 tokens per second, down from roughly 92 TPS when the model first launched. That's a drop of nearly 46%. A provider dashboard shared in the post shows Google AI Studio running even lower, at 46 TPS. The numbers tell a clear story: something on the infrastructure side isn't keeping up with demand.

Simple Tasks Taking Longer as Vertex Throughput Slides to 50 TPS

Users aren't just seeing numbers change on a dashboard. They're feeling it. Simple tasks are taking noticeably longer, and the overall responsiveness has become harder to predict. The core issue here isn't model quality. Most people running Gemini 3.1 Pro agree the intelligence is still sharp. What's breaking down is the infrastructure behind it, specifically how fast and consistently the model can deliver output at scale.

This isn't an entirely unfamiliar situation. Gemini 3.1 Pro jumped 13 points above Gemini 3 Pro on the ArenaAI leaderboard, making its initial release one of the more anticipated in recent months. But the post notes that similar performance fluctuations showed up during earlier Google AI rollouts too. A rough launch isn't new territory for Google.

Infrastructure Strain Raises Questions About Enterprise Readiness

The report doesn't frame this as a permanent downgrade. Nobody is saying the model got worse overnight. What it does point to is a backend under strain, one that wasn't fully prepared for the traffic spike that followed launch. Google's Gemini 3.1 Pro launched with a 771 ARC-AGI-2 benchmark score, which generated a lot of interest. That kind of attention puts real pressure on serving infrastructure fast.

For enterprise users in particular, throughput isn't a secondary metric. When you're running production workloads, a 46% drop in TPS has a direct impact on latency, cost efficiency, and user experience. The author behind the original post remains optimistic about the long-term picture, and there's no suggestion of pricing or context limit changes. Still, Google's broader Gemini 3 infrastructure updates are clearly still being worked through. The current slowdown is a reminder that launching a capable model is only half the job. Keeping it fast and stable under real-world load is the other half.

News Source

#AI News #Google #Gemini 3.1 Pro

Usman Salis E-mail

Usman has been in the blockchain space for 9 years and written dozens of articles about crypto in his career. He wants to put crypto on the global map.