Language

Synthszr Charts — die großen AI-Marken im Wettkampf ums Podium

GLM-4.5V

#36 in Multimodal Models

zhipu · v4.5v · seit 2025-08-11 · 2× · zuletzt 29. Juni 2026

Momentum

GLM-4.5V is a multimodal vision-language model by Zhipu AI (Z.ai) built on the GLM-4.5-Air architecture (106B total parameters, 12B active, MoE). Released on August 11, 2025 as open-source under the MIT license, it accepts image, video, and text inputs. It achieves state-of-the-art results on 42 public vision-language benchmarks among open-source models of comparable size, and features a switchable "Thinking Mode" for deep reasoning.

Momentum trend

04.04.03.07.

Features

Context Window (Tokens)	65,536 token context window (OpenRouter); SiliconFlow lists 66K; max output 16,384 tokens
Multimodal Inputs	Text, images (native resolution/aspect ratio), videos; tool use; supported tasks: image Q&A, OCR, document parsing, GUI agents, visual grounding, video understanding, frontend coding
On-Device vs. Cloud	Cloud API (via Z.ai / bigmodel.cn, OpenRouter, Fireworks, Novita, and others); open-source (MIT), self-hostable with FP8/BF16 via Transformers, vLLM, SGLang
Price per Unit	$0.60 per 1M input tokens / $1.80 per 1M output tokens (via OpenRouter, TypingMind, developer.puter.com – as of Jun 2026)
Video Analysis Capability	Supports long-video segmentation and event detection (VideoMME, MMVU, LVBench); timestamp token encoding for temporal understanding; benchmarks: VideoMME, MMVU, MotionBench, MVBench, LVBench

GLM-4.5V

Features

Sources (2)

Subscribe free. Unsubscribe the second it sucks.