Inference Library
Notes on building and serving large language models at scale.
Inference Serving Roadmap →
The challenges of large-scale LLM inference serving, in idea-dependency order and grounded in vLLM's source code, from KV cache and continuous batching to disaggregation, routing, and autoscaling.
Inference Study Guides →
Source-level guides to the open-source inference stack — where routing, batching, KV-cache management, and disaggregation live in real repositories.