

MAI-Transcribe-1
#16 in Transcription (STT)microsoft · v1 · seit 2. April 2026 · 16× · zuletzt 30. Juni 2026
MAI-Transcribe-1 is Microsoft's first in-house automatic speech recognition (ASR) model, built by the MAI Superintelligence team, converting speech into text across 25 languages. Microsoft states it achieves the lowest Word Error Rate (WER, ~3.9%) on the FLEURS benchmark, outperforming Whisper-large-V3, GPT-Transcribe, ElevenLabs Scribe v2, and Gemini 3.1 Flash-Lite. It runs about 2.5x faster than Azure Fast Transcription at roughly 50% lower GPU cost, starting at $0.36 per audio hour. The model is available in public preview via Microsoft Foundry and Azure Speech, but does not yet support real-time transcription, speaker diarization, or keyword/context biasing (Microsoft states these are planned for a future update).
Features
| Real-Time Streaming | Not supported (batch model); real-time transcription reportedly in development by Microsoft |
| Latency | Batch transcription 2.5x faster than Azure Fast Transcription; ~69x real-time according to Artificial Analysis |
| Platform | Microsoft Foundry / Azure Speech (LLM Speech API); integrated into Copilot, Teams, Bing, PowerPoint |
| Price | From $0.36 per audio hour |
| Release Date | April 2, 2026 (Public Preview) |
| Languages | 25 languages (incl. English, German, French, Spanish, Hindi, Japanese, Korean, Chinese, Arabic) |