At Inithouse — a studio running parallel product experiments — we built Be Recommended, a tool that checks how visible your brand is across ChatGPT, Perplexity, Claude, and Gemini. The idea sounded simple: query multiple AI models, score the results, show a report.
It was not simple. Here are four technical mistakes we made shipping v1 — and the fixes that actually survived production.
Mistake 1: Rate Limiting Was an Afterthought
We treated rate limits as edge cases. They were not. Every AI provider has different rate-limit headers, different backoff expectations, and different definitions of "too many requests." Our first architecture just retried on 429. That turned a rate limit into a cascade — one provider throttling triggered a retry storm that cascaded to the others.
The fix: Per-provider circuit breakers with exponential backoff. Each provider gets its own state machine. When a circuit opens, we serve cached results for that provider and mark the score as "partial" in the UI. Users see real data, not a spinner that never resolves.







