The Problem We Were Actually Solving
It wasnt supposed to be a language swap. We had built the Treasure Hunt Engine on Go 1.20, using channels and sync.Pool to keep each players state off the GC. At 100k concurrent hunters, the p99 latency sat at 34 ms and allocations were 42 MB/s. But the moment load balancers pushed 500k connections through a single shard, the Go runtime started JIT-compiling escape analysis at runtime, and the minor GC pauses spiked to 18 ms every 200 ms. Players reported rubber-banding when the GC ran, and the SLO of 50 ms p99 became impossible. The profiler showed the allocator was spending 37 % of its time in mcache_get rather than servicing real work.
What We Tried First (And Why It Failed)
We bolted on jemalloc via MALLOC=tcmalloc, which dropped allocations to 22 MB/s and GC pause to 9 ms. Next, we tuned GOGC=10, which cut GC time in half but introduced a 5 % tail latency regression from cold caches. We even wrote a custom arena-based allocator for player state, but the Go schedulers run-queue contention meant we were still serializing context switches. After two weeks of profiling, we had gained 300 ms of headroom at 500k connections, but the growth curve still turned exponential at 1.2M. The language wasnt just a nuisance; it was the inflection point.






