I Built a Production-Grade AI Gateway in Rust — Here's What I Learned

TL;DR — I built a production-ready, distributed API gateway in Rust that routes traffic to OpenAI, Anthropic (Claude), and Ollama with auth, Redis rate limiting, PostgreSQL usage tracking, Prometheus metrics, and AWS Terraform deployment. Gateway overhead is ~1.2ms P99. Here's why and how.

The Problem I Kept Running Into

Six months ago, my team was using OpenAI directly in every service. Then we wanted to test Claude 3.5 Sonnet for some tasks. Then our compliance team asked: "Can you show me every AI request we've ever made?" Then someone ran a batch job and we got a surprise invoice.

Sound familiar?

The real problem wasn't any single API — it was that we had no control plane for our AI traffic. No unified auth, no rate limiting per team, no cost visibility, no ability to swap providers without touching application code.

I Built a Production-Grade AI Gateway in Rust — Here's What I Learned

Related reading

I Built a Production-Oriented Multi-Provider AI Chatbot in Rust — Here's How

I Built My Own API Gateway in Rust — Here's What I Learned

Building Production AI Systems(Part 1)

What Building a Multi-Model AI Gateway Taught Me About Reliability

Bridging Python and Rust: Mitigating GIL Contention in a High-Throughput LLM…

How I Built a Distributed API Gateway with Rate Limiting, BullMQ Queues, and…