The Problem We Were Actually Solving

The Veltrix public cluster runs up to 1,200 concurrent worlds, each with an in-memory treasure-hunt engine that must atomically pick, lock, and drop loot within 20 ticks (400 ms) or the spawn rules break and loot float freely. Our first implementation offloaded pickup detection to a Lua sandbox per world, then queued loot drops to a single global allocator. At 400 worlds the allocator saturated, resulting in runtime: out of memory with 1.8 GB RSS per pod. We watched pprof flame graphs show 32 % of CPU time lost in mutexes around sync.Map shards, not the lock-heavy treasure math. The real problem was allocation rate, not the algorithm.

What We Tried First (And Why It Failed)

We rewrote the engine in Go 1.20 and adopted a staged pipeline: arena allocator per tick, garbage-free state machines, and go:embed for static loot tables. Latency dropped—until GC jitter arrived. Running go tool trace -c 1000 during a global drop revealed 4.2 ms mark-sweep pauses every 70 ms, coincident with 95th-percentile tooltips flickering. We tried GOGC=off, which pushed RSS to 3 GB and triggered OOM killer. We tried runtime.SetGCPercent(5), which stabilized at 200 ms p99 but broke deterministic seeding: the global PRNG state became racy under GC compaction.