⬤ Mistral AI released Voxtral Transcribe 2, a next-generation speech-to-text suite designed for transcription workflows and live audio applications. The update introduces two models—Voxtral Mini Transcribe V2 and Voxtral Realtime—along with a new audio playground inside Mistral Studio for real-time testing and experimentation.
⬤ Voxtral Mini Transcribe V2 handles batch transcription and supports speaker diarization, context biasing, and 13 languages. The company priced it at $0.003 per minute and achieved a 4% word error rate on the FLEURS benchmark. According to the release, this performance beats competing systems including GPT-4o mini and ElevenLabs Scribe v2 under identical benchmark conditions.
⬤ Voxtral Realtime focuses on streaming transcription with sub-200 millisecond latency, targeting voice agents, media workflows, and contact center applications. The audio playground in Mistral Studio lets users test live transcription behavior interactively without additional setup.
⬤ The launch highlights intensifying competition in AI voice processing as platforms push into real-time and multimodal interaction. Advances in latency, accessibility, and benchmark accuracy demonstrate ongoing evolution of speech interfaces across enterprise communication systems.
Peter Smith
Peter Smith