Synthszr Charts — die großen AI-Marken im Wettkampf ums Podium
synthszr charts
microsoft

MAI-Transcribe-1

#16 in Transcription (STT)

microsoft · v1 · seit 2. April 2026 · 16× · zuletzt 30. Juni 2026

8
Momentum

MAI-Transcribe-1 is Microsoft's first in-house automatic speech recognition (ASR) model, built by the MAI Superintelligence team, converting speech into text across 25 languages. Microsoft states it achieves the lowest Word Error Rate (WER, ~3.9%) on the FLEURS benchmark, outperforming Whisper-large-V3, GPT-Transcribe, ElevenLabs Scribe v2, and Gemini 3.1 Flash-Lite. It runs about 2.5x faster than Azure Fast Transcription at roughly 50% lower GPU cost, starting at $0.36 per audio hour. The model is available in public preview via Microsoft Foundry and Azure Speech, but does not yet support real-time transcription, speaker diarization, or keyword/context biasing (Microsoft states these are planned for a future update).

Momentum trend
04.04.03.07.

Features

Real-Time StreamingNot supported (batch model); real-time transcription reportedly in development by Microsoft
LatencyBatch transcription 2.5x faster than Azure Fast Transcription; ~69x real-time according to Artificial Analysis
PlatformMicrosoft Foundry / Azure Speech (LLM Speech API); integrated into Copilot, Teams, Bing, PowerPoint
PriceFrom $0.36 per audio hour
Release DateApril 2, 2026 (Public Preview)
Languages25 languages (incl. English, German, French, Spanish, Hindi, Japanese, Korean, Chinese, Arabic)

Sources (16)

Subscribe free. Unsubscribe the second it sucks.

High-signal news across AI, business, UX, and tech. Every morning.