Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Preface

A companion to the Inference Serving Roadmap: eight source-level study guides that map the concepts of LLM inference serving onto the actual code of the open-source stack — vLLM, a minimal teaching engine (nano-vllm), SGLang, FlashInfer, XGrammar, NVIDIA Dynamo, llm-d, and the Gateway API Inference Extension.

Each guide is written for an engineer who already understands large-scale traffic systems and wants to find where routing, batching, KV-cache management, and disaggregation actually live in real repositories. Code is quoted from each project under its own (permissive) license with file-path attribution; research papers are cited as jumping-off points, not reproduced.

Read vLLM first for the reference design, nano-vllm to see the same ideas in a few hundred lines, then branch out by interest.