

Sonic-3.5
#2 in Text-to-Speech (TTS)cartesia · v3.5 · seit 2026-06-16 · 9× · zuletzt 30. Juni 2026
69
Momentum
Cartesia Sonic-3.5 is a real-time text-to-speech model released on June 16, 2026, alongside the speech-recognition model Ink-2. Built on State Space Models (SSMs), it achieves a time-to-first-audio latency of under 90 ms according to the manufacturer. Sonic-3.5 ranks #1 on the Artificial Analysis Speech Arena Leaderboard with an Elo score of 1,218 and natively supports 42 languages including 9 Indian languages. The platform supports cloud, on-premise, and on-device deployment.
Momentum trend
04.04.03.07.
Features
| Latency (ms) | < 90 ms Time-to-First-Audio (standard); approx. 82 ms end-to-end per Cartesia/Artificial Analysis; Turbo variant approx. 40 ms TTFB |
| Multilingualism (Dialects) | Accent localization available (e.g., Irish, New Zealand, South African, Belgian); 2026 changelog lists 94 new voices across 17 locales; automatic language adaptation to input text |
| On-Device Execution | Yes – Cartesia supports cloud, on-premise, and on-device deployment; Sonic On-Device (private beta) for real-time streaming on mobile devices and embedded hardware via SSM architecture |
| Languages | 42 languages natively (including English, Hindi, Spanish, French, German, Japanese, Hebrew, and 35 more), incl. 9 Indian languages |
| TTS/STT Quality (Score) | Elo 1,218 on the Artificial Analysis Speech Arena Leaderboard (rank 1; based on 1,144 arena comparisons, ±16) |