The Day the GC Tuning Patch Broke the Leaderboard

The Problem We Were Actually Solving

We ran an in-memory leaderboard service for a competitive event platform, caching 400 k leaderboard rows at 40 MB/s write throughput. On week 5 the Go GC decided it needed 200 ms pauses every 700 ms, and P99 latency jumped from 8 ms to 112 ms. The event was still two weeks out. Our Redis cluster wasnt the bottleneck—the Go runtime was.

What We Tried First (And Why It Failed)

We tried every GC percentile flag Go gave us: GOGC=50, GOMEMLIMIT=4G, even runtime.SetGCPercent(-1) to disable it entirely. Pauses disappeared, but RSS ballooned to 12 GB on a 4-core box and we started OOM-killing. The culprit wasnt the GC alone; it was the interaction with our 256-byte per-row allocation pattern. Each leaderboard update allocated a new slice header, the old slice lingered, and the GC would wake up to a heap that was 90 % unreferenced yet not collected because of the lingering headers.

We benchmarked with go test -bench=. -benchtime=10s -count=5 and got 24.3 ns/op with GC enabled versus 18.7 ns/op with it disabled, but disabled mode leaked until the box crashed. We needed a different language.

The Problem We Were Actually Solving

What We Tried First (And Why It Failed)

The Day the GC Tuning Patch Broke the Leaderboard

The Day the GC Tuning Patch Broke the Leaderboard

Related reading

The Moment the JVM Tuning Knob Broke Our Treasure Hunt Engine

The Moment the Config Parser Became the Bottleneck

The One Cache That Broke Our Treasure Hunt Engine

The Day the Language Became the Bottleneck

How the Events Table That Looked Right Killed Our Queue

I Was Wrong About Events for Three Years—Until I Learned What Async Runtime Was…

Related reading

The Moment the JVM Tuning Knob Broke Our Treasure Hunt Engine

The Moment the Config Parser Became the Bottleneck

The One Cache That Broke Our Treasure Hunt Engine

The Day the Language Became the Bottleneck

How the Events Table That Looked Right Killed Our Queue

I Was Wrong About Events for Three Years—Until I Learned What Async Runtime Was…