Building a Real‑Time Notification System: Why a Simple Token Bucket Beats Fancy Alternatives
Quick context (why you're writing this)
Honestly, I still remember the night our notification service started dropping alerts like hot potatoes. We were pushing millions of events per hour through a Kafka cluster, and every time a spike hit, the downstream workers would either get overwhelmed or start throttling themselves too aggressively. The on‑call pager was screaming, and I spent three hours digging through metrics only to realize we were trying to solve a traffic‑shaping problem with a hammer when we needed a scalpel. That “aha” moment taught me that the real bottleneck isn’t the message bus—it’s how we guard the workers from bursty traffic. So let’s talk about the one piece that made the difference: a rate limiter that actually works at scale.
The Insight
The critical insight is this: a rate limiter doesn’t need to be perfect; it just needs to be cheap, fast, and biased toward letting through some traffic rather than blocking everything. In a real‑time notification pipeline, losing a few events is far less costly than stalling the whole system because a limiter became a single point of failure or added latency that cascaded downstream.






