A few months back, I was building a content generation tool. The idea was simple: take a list of topics, hit the OpenAI API, and get SEO-optimized articles. My prototype worked great with 5 topics. Then I scaled to 50. Then 200.

That’s when the 429s started flooding my logs. Rate limited. Overloaded. Blocked.

I was frustrated. Not because the API was unstable — it’s actually very reliable — but because I hadn’t thought about the pace of my requests. Every failed call meant lost time, wasted retries, and eventually a complete stall while I waited for the cooldown to end.

What didn’t work (and why I was dumb)

My first attempt was naïve: just wrap the call in a try/except and retry after a fixed 5 seconds.