Inference Library

Notes on building and serving large language models at scale.

Inference Serving Roadmap

The challenges of large-scale LLM inference serving, in idea-dependency order and grounded in vLLM's source code, from KV cache and continuous batching to disaggregation, routing, and autoscaling.

21 chapters · 5 parts

Inference Study Guides

Source-level guides to the open-source inference stack — where routing, batching, KV-cache management, and disaggregation live in real repositories.

vLLM · nano-vllm · SGLang · FlashInfer · XGrammar · Dynamo · llm-d · Gateway API Inference Extension