

Qwen2.5-VL-7B
#34 in Multimodal Modelsalibaba · v2.5 · vl 7b · seit 2025-01-28 · 2× · zuletzt 29. Juni 2026
10
Momentum
Qwen2.5-VL-7B is a language model by Alibaba with vision-language capabilities. It is used for processing image sequences and can significantly reduce token consumption through optimized encoding strategies.
Momentum trend
04.04.03.07.
Features
| Price per Unit | Open-source (weights free via Hugging Face / ModelScope); API via OpenRouter: $0.20/million input tokens, $0.20/million output tokens (list price as of 2025, third-party provider) |
| Vision-Language Benchmark Score | DocVQA: 95.7% | ChartQA: 87.3% | OCRBench: 86.4% | Android Control Low_EM: 91.4% (source: llm-stats.com); outperforms GPT-4o-mini in several tasks according to official blog |