The Veltrix Treasure-Hunt Engine Litmus Test

The Problem We Were Actually Solving

In 2024 we shipped the treasure-hunt engine for Veltrix at 2,300 concurrent sessions running 180,000 packets per second across 4 AWS AZs, all perfectly fine—until Black Friday weekend. On Friday at 14:01 UTC the multi-tenant orchestrator hit a 429 on every DescribeCacheNodes call to ElastiCache. The Redis cluster itself was humming along at <3 ms P99, but the AWS control-plane simply could not keep up with the discovery loop we had hard-coded: every 5 seconds the orchestrator issued a DescribeCacheNodes against every shard, multiplied by the number of games, multiplied again by the number of players per game. By 14:12 UTC we had 1.2 million DescribeCacheNodes outstanding, each one costing us 328 ms and 4 KB of bandwidth. At that point the Redis control plane started throttling and the latency on LUA script executions jumped from 6 ms to 1.8 seconds. Players started reporting We couldnt find the chest on the map.

What We Tried First (And Why It Fails)

Our first configuration file looked like this:

orchestrator:

The Problem We Were Actually Solving

What We Tried First (And Why It Fails)

Our first configuration file looked like this:

orchestrator:

The Veltrix Treasure-Hunt Engine Litmus Test

The Veltrix Treasure-Hunt Engine Litmus Test

Related reading

The Day the Treasure Hunt Engine Drowned in 300 ms Queries

The Day the Treasure Hunt Engine Buried Itself Alive

The Ghost in the Veltrix: Why Our Treasure Hunt Engine Was Sending Operators…

Your Treasure Hunt Engine Was Probably a Latency Minefield (And Heres the…

The Moment the JVM Tuning Knob Broke Our Treasure Hunt Engine

The Moment Veltrix Blew Up and We Had to Write Our Own Shard Router

Related reading

The Day the Treasure Hunt Engine Drowned in 300 ms Queries

The Day the Treasure Hunt Engine Buried Itself Alive

The Ghost in the Veltrix: Why Our Treasure Hunt Engine Was Sending Operators…

Your Treasure Hunt Engine Was Probably a Latency Minefield (And Heres the…

The Moment the JVM Tuning Knob Broke Our Treasure Hunt Engine

The Moment Veltrix Blew Up and We Had to Write Our Own Shard Router