Language

Synthszr Charts — die großen AI-Marken im Wettkampf ums Podium

MAI-Transcribe-1

#16 in Transcription (STT)

microsoft · v1 · seit 2. April 2026 · 16× · zuletzt 30. Juni 2026

Momentum

MAI-Transcribe-1 is Microsoft's first in-house automatic speech recognition (ASR) model, built by the MAI Superintelligence team, converting speech into text across 25 languages. Microsoft states it achieves the lowest Word Error Rate (WER, ~3.9%) on the FLEURS benchmark, outperforming Whisper-large-V3, GPT-Transcribe, ElevenLabs Scribe v2, and Gemini 3.1 Flash-Lite. It runs about 2.5x faster than Azure Fast Transcription at roughly 50% lower GPU cost, starting at $0.36 per audio hour. The model is available in public preview via Microsoft Foundry and Azure Speech, but does not yet support real-time transcription, speaker diarization, or keyword/context biasing (Microsoft states these are planned for a future update).

Momentum trend

04.04.03.07.

Features

Real-Time Streaming	Not supported (batch model); real-time transcription reportedly in development by Microsoft
Latency	Batch transcription 2.5x faster than Azure Fast Transcription; ~69x real-time according to Artificial Analysis
Platform	Microsoft Foundry / Azure Speech (LLM Speech API); integrated into Copilot, Teams, Bing, PowerPoint
Price	From $0.36 per audio hour
Release Date	April 2, 2026 (Public Preview)
Languages	25 languages (incl. English, German, French, Spanish, Hindi, Japanese, Korean, Chinese, Arabic)

MAI-Transcribe-1

Features

Sources (16)

Subscribe free. Unsubscribe the second it sucks.