Most LangGraph content stops at the notebook. You build a cute ReAct loop, it answers one question, and the article ends before the hard part: how do you actually serve this thing, swap models without a rewrite, and see what it's doing when it misbehaves?

This post walks through a small but production-shaped LangGraph deployment: a RAG ReAct agent that

exposes an OpenAI-compatible HTTP API, so any OpenAI client (Open WebUI, the openai SDK, LibreChat) can talk to it unchanged,

routes every model call through a gateway so switching from a hosted API to self-hosted vLLM is a config change, not a code change, and

gets full tracing — node transitions, tool calls, and LLM calls in one trace — by adding a single callback.