Language

Synthszr Charts — die großen AI-Marken im Wettkampf ums Podium

Gemini Omni

#6 in Multimodal Models

google · seit Mai 2026 · 125× · zuletzt 02. Juli 2026

Momentum

Gemini Omni is a natively multimodal video generation and editing model from Google DeepMind that accepts text, images, audio, and existing video as inputs and produces videos with synchronized audio as output. The model enables iterative, conversation-based video editing through multiple turns. Gemini Omni Flash launched on May 19, 2026, at Google I/O and is available through the Gemini app, Google Flow, and YouTube Shorts.

Momentum trend

04.04.03.07.

Features

Key Benchmark (%)	#1 Overall Preference & Instruction Following (MovieGenBench, 1,003 prompts, Meta); #1 Text-to-Video & Image-to-Video (internal benchmarks, human side-by-side, 504 examples).
License	Proprietary (Google). Use subject to Gemini API Additional Terms of Service & Gen AI Prohibited Use Policy.
Multimodality	Input: text, image, audio, video (simultaneous). Output: video with native audio (up to 10 sec, 720p). Architecture: transformer-based, natively multimodal.
Platform	Gemini App, Google Flow (AI Plus/Pro/Ultra); YouTube Shorts & YouTube Create (free, 18+); API via Gemini API / Vertex AI (preview)
Price	Free: YouTube Shorts / YouTube Create (18+). Gemini App: AI Plus from ~$7.99/mo, AI Pro $19.99/mo, AI Ultra $100–$200/mo. API pricing not yet officially released.
Price per 1M Tokens	API pricing not officially released (as of mid-June 2026). Vertex AI: 5,792 tokens/sec for video input & output (720p); exact token $/1M rate still pending.
Release Date	May 19, 2026 (Google I/O 2026) – first model: Gemini Omni Flash

Gemini Omni

Features

Sources (60)

Subscribe free. Unsubscribe the second it sucks.