TL;DR — I built a production-ready, distributed API gateway in Rust that routes traffic to OpenAI, Anthropic (Claude), and Ollama with auth, Redis rate limiting, PostgreSQL usage tracking, Prometheus metrics, and AWS Terraform deployment. Gateway overhead is ~1.2ms P99. Here's why and how.

The Problem I Kept Running Into

Six months ago, my team was using OpenAI directly in every service. Then we wanted to test Claude 3.5 Sonnet for some tasks. Then our compliance team asked: "Can you show me every AI request we've ever made?" Then someone ran a batch job and we got a surprise invoice.

Sound familiar?

The real problem wasn't any single API — it was that we had no control plane for our AI traffic. No unified auth, no rate limiting per team, no cost visibility, no ability to swap providers without touching application code.