Mistral AI's New Speech Model Beats GPT-4o Mini with 4% Error Rate

Mistral AI unveiled Voxtral Transcribe 2, a speech-to-text suite delivering superior accuracy and sub-200ms latency. The release features batch and real-time transcription models designed to outperform existing AI voice solutions.

⬤ Mistral AI released Voxtral Transcribe 2, a next-generation speech-to-text suite designed for transcription workflows and live audio applications. The update introduces two models—Voxtral Mini Transcribe V2 and Voxtral Realtime—along with a new audio playground inside Mistral Studio for real-time testing and experimentation.

⬤ Voxtral Mini Transcribe V2 handles batch transcription and supports speaker diarization, context biasing, and 13 languages. The company priced it at $0.003 per minute and achieved a 4% word error rate on the FLEURS benchmark. According to the release, this performance beats competing systems including GPT-4o mini and ElevenLabs Scribe v2 under identical benchmark conditions.

⬤ Voxtral Realtime focuses on streaming transcription with sub-200 millisecond latency, targeting voice agents, media workflows, and contact center applications. The audio playground in Mistral Studio lets users test live transcription behavior interactively without additional setup.

⬤ The launch highlights intensifying competition in AI voice processing as platforms push into real-time and multimodal interaction. Advances in latency, accessibility, and benchmark accuracy demonstrate ongoing evolution of speech interfaces across enterprise communication systems.

News Source

#AI News #Mistral AI #Voxtral Transcribe 2

Peter Smith E-mail

Peter Smith - web3.0 projects expert and writer exploring the intersection of blockchain, AI, and online entertainment.