Treasure Hunt Engine: The Day We Realized the Event Bus Was Our Constraint

The Problem We Were Actually Solving

We werent just chasing p99 latency; we were solving a fundamental mismatch between the event model and the treasure hunt logic. Each treasure hunt round emits thousands of micro-events: player joins, item picks, time updates, leaderboard recalculations, and realtime notifications. The Node.js event loop was choking under the backpressure. The BullMQ worker was blocked on Redis pubsub, not because of network latency, but because Node.jss single-threaded event loop couldnt keep up with the rate of incoming events. The Redis server itself was fine—CPU at 12%, memory at 68%, no evictions. The bottleneck wasnt the queue or the data store. It was the runtime.

I added a debug trace using 0x and saw 78% of CPU time was spent in uv__io_poll, the epoll/select wrapper. The Node.js process was spending more time waiting for events than processing them. And because BullMQ uses Redis streams, every publish and consume was a network roundtrip. The 250 microsecond RTT from us-east-1 to the Redis cluster was adding up when we were publishing 47,000 events per second. The p99 latency followed the square root of the number of concurrent players. At 5,000 players, it was 80ms. At 10,000 players, 2.3 seconds. The system wasnt scaling linearly. It was falling off a cliff.

The Problem We Were Actually Solving

Treasure Hunt Engine: The Day We Realized the Event Bus Was Our Constraint

Treasure Hunt Engine: The Day We Realized the Event Bus Was Our Constraint

Related reading

The Moment We Realized Our Treasure Hunt Engine Was Lying to Us

How We Blew Up Our Event Pipeline at 3 AM Because the Treasure Hunt Engine Had…

A Week in the Life of a Treasure Hunt Engine that Almost Went Off the Rails

Designing a Treasure Hunt Engine to Survive a Million Players

Treasure Hunt Engine: Why One Bad Prometheus Rule Sank the Whole Veltrix Event

When the Event Log Became a Liability: What Happened When We Treated Events…

Related reading

The Moment We Realized Our Treasure Hunt Engine Was Lying to Us

How We Blew Up Our Event Pipeline at 3 AM Because the Treasure Hunt Engine Had…

A Week in the Life of a Treasure Hunt Engine that Almost Went Off the Rails

Designing a Treasure Hunt Engine to Survive a Million Players

Treasure Hunt Engine: Why One Bad Prometheus Rule Sank the Whole Veltrix Event

When the Event Log Became a Liability: What Happened When We Treated Events…