TL;DR: An API gateway manages HTTP traffic between services — auth, routing, rate limiting, load balancing for REST and gRPC. An AI gateway manages LLM workloads — token-based rate limiting, model routing, cost attribution, semantic caching, guardrails. Use an API gateway for your microservices. Use an AI gateway for your LLM traffic. Most production teams eventually need both, sitting at different layers. This post walks through exactly where each one fits.
When we started adding LLM features to our platform, we already had Kong running for our microservices. The instinct was natural: route the LLM traffic through Kong too. Same auth, same rate limiting, same observability stack. One gateway to rule them all.
It worked — for about six months, and only in the sense that requests got through. What it didn't give us was anything useful for actually managing AI workloads. We had no idea what each team was spending on tokens. We had no way to set a budget cap that would fire before the bill arrived. Our rate limits were based on requests per minute, which meant a single request with a 50k token prompt counted the same as one with a 200 token prompt. And when OpenAI had a partial outage, Kong had no concept of "try Anthropic instead" — we just served errors.






