

Gemini Omni
#6 in Multimodal Modelsgoogle · seit Mai 2026 · 125× · zuletzt 02. Juli 2026
88
Momentum
Gemini Omni is a natively multimodal video generation and editing model from Google DeepMind that accepts text, images, audio, and existing video as inputs and produces videos with synchronized audio as output. The model enables iterative, conversation-based video editing through multiple turns. Gemini Omni Flash launched on May 19, 2026, at Google I/O and is available through the Gemini app, Google Flow, and YouTube Shorts.
Momentum trend
04.04.03.07.
Features
| Key Benchmark (%) | #1 Overall Preference & Instruction Following (MovieGenBench, 1,003 prompts, Meta); #1 Text-to-Video & Image-to-Video (internal benchmarks, human side-by-side, 504 examples). |
| License | Proprietary (Google). Use subject to Gemini API Additional Terms of Service & Gen AI Prohibited Use Policy. |
| Multimodality | Input: text, image, audio, video (simultaneous). Output: video with native audio (up to 10 sec, 720p). Architecture: transformer-based, natively multimodal. |
| Platform | Gemini App, Google Flow (AI Plus/Pro/Ultra); YouTube Shorts & YouTube Create (free, 18+); API via Gemini API / Vertex AI (preview) |
| Price | Free: YouTube Shorts / YouTube Create (18+). Gemini App: AI Plus from ~$7.99/mo, AI Pro $19.99/mo, AI Ultra $100–$200/mo. API pricing not yet officially released. |
| Price per 1M Tokens | API pricing not officially released (as of mid-June 2026). Vertex AI: 5,792 tokens/sec for video input & output (720p); exact token $/1M rate still pending. |
| Release Date | May 19, 2026 (Google I/O 2026) – first model: Gemini Omni Flash |