Gemini 3.1 Flash Hits 95.9% in Speech AI Benchmark

Google's Gemini 3.1 Flash Live Preview ranks second in a key speech reasoning benchmark. The model introduces configurable thinking levels balancing performance and latency.

Contents

Gemini 3.1 Flash Outperforms Grok and Other Speech AI Rivals
Gemini 3.1 Flash Latency Drops to 0.96 Seconds on Minimal Thinking

As Artificial Analysis reported, Google has released Gemini 3.1 Flash Live Preview, a speech-to-speech AI model that achieved a 95.9% score on the Big Bench Audio benchmark at its highest reasoning setting. The result places it second overall, behind Step-Audio R1.1 Realtime at 97.0%. The release introduces configurable thinking levels, letting developers adjust reasoning depth and latency as GOOGL continues expanding its AI ecosystem alongside developments like GPT-5.4 mini scores 72.1 on OSWorld.

The model's highest thinking level delivers near-peak intelligence performance, while switching to minimal reduces the score to 70.5%.

Gemini 3.1 Flash Outperforms Grok and Other Speech AI Rivals

Benchmark results confirm Gemini 3.1 Flash Live Preview outperforms competitors including Grok Voice Agent, which scored 92.9%, placing Google firmly among top-tier speech reasoning models.

The drop in accuracy at lower thinking settings corresponds with a significant improvement in speed, illustrating a clear trade-off between reasoning depth and latency. These capabilities align with broader advancements in AI reasoning systems such as Google DeepMind's Aletheia hits 91.9 on math benchmark.

Gemini 3.1 Flash Latency Drops to 0.96 Seconds on Minimal Thinking

Latency metrics further highlight the model's flexibility. At the highest reasoning level, Time to First Audio (TTFA) sits at approximately 2.98 seconds, slower than Step-Audio R1.1 Realtime at 1.51 seconds and Grok Voice Agent at 0.78 seconds. However, when configured to minimal thinking, TTFA drops to 0.96 seconds, approaching faster systems while maintaining competitive performance. Pricing stays at $0.35 per hour for audio input and $1.38 per hour for audio output, consistent with previous Gemini audio models. The balance between speed and intelligence reflects a broader trend seen in NanoClaw AI launches Claude-powered assistant.

Pricing remains unchanged at $0.35 per hour for audio input and $1.38 per hour for audio output.

The release highlights a shift toward customizable AI performance, where developers can dynamically balance latency and reasoning depending on application requirements. As GOOGL advances its position in real-time voice AI, features like adjustable thinking levels may influence deployment strategies across industries relying on speech interfaces.

Gemini 3.1 Flash Live Preview scores 95.9% on Big Bench Audio benchmark
Highest thinking level: TTFA of 2.98 seconds; minimal thinking: 0.96 seconds
Grok Voice Agent scored 92.9%; Step-Audio R1.1 Realtime leads at 97.0%
Pricing: $0.35/hr audio input, $1.38/hr audio output

This evolution underscores increasing competition in multimodal AI, where performance, responsiveness, and cost efficiency are becoming central to adoption and long-term market positioning.

News Source

#AI #Benchmark #Gemini 3.1 Flash

Eseandre Mordi E-mail

Eseandre Mordi - writer covering crypto, blockchain, and AI with a global perspective and a strong voice for women in tech.