Language

Synthszr Charts — die großen AI-Marken im Wettkampf ums Podium

vLLM

#1 in LLM Inference & Serving

vllm · seit Juni 2023 (offizielles erstes Release) · 40× · zuletzt 30. Juni 2026

100

Momentum

vLLM is an open-source inference and serving engine for Large Language Models (LLMs), originally developed at UC Berkeley's Sky Computing Lab and maintained as a community project since 2023. Its core architecture is based on PagedAttention (virtual memory management of the KV cache) and continuous batching, delivering significantly higher throughput than naive serving approaches. vLLM supports 200+ model architectures from Hugging Face and runs on a broad range of hardware accelerators. The project is free to use (Apache 2.0) and is backed by an ecosystem of over 2,000 contributors and sponsors including NVIDIA, AMD, Google, AWS, and Intel.

Momentum trend

04.04.03.07.

Features

License	Apache License 2.0
Price	Free / Open Source (no license fees; donations via GitHub & OpenCollective)
Release Date	June 2023 (first official release); currently v0.24.0 on PyPI (as of July 2026)

vLLM

Features

Sources (40)

Subscribe free. Unsubscribe the second it sucks.