The Day the Treasure Hunt Engine Buried Itself Alive

The Problem We Were Actually Solving

Our first production spike came during a Black Friday weekend when the hunt feed exceeded 180k concurrent sessions. The Rails process memory ballooned to 4.2 GB, the P99 latency for /hunt/start hit 5.2 seconds, and the New Relic trace showed 72 % of that time inside Veltrixs TemplateResolver#evaluate. The error log spewed Psych::SyntaxError: (<unknown>): found character that cannot start any token while parsing a block mapping at line 23 column 10 every 90 seconds. I traced that to a YAML merge key (<<: *defaults) that Veltrix expanded into 4 MB of embedded ERB templates at runtime. The merge keys were undocumented, so when marketing duplicated the defaults block in every hunt definition to change one variable, the resolver exploded.

What We Tried First (And Why It Failed)

I replaced the YAML parser with SafeYAML, set safe_load: true, and wrapped every template in a literal block. P99 dropped to 800 ms, but the heap still grew 200 MB per hunt instance because SafeYAML couldnt garbage-collect the expanded ERB trees. Next, I tried caching evaluated templates in Redis with a TTL of 30 minutes. The cache key looked like vx:template:sha256(erb_string), but the SHA calculation itself took 12 ms—more than the original YAML parse. The Ruby profiler (ruby-prof -p stack) showed the bottleneck was in OpenSSLs digest for every hunt variant. Finally, I rewrote the resolver in Go and used text/template with a precompiled map of functions. Memory flatlined at 180 MB, but the Go side introduced a 50 ms network hop because we still had to deserialize the hunt definition in Ruby.

The Problem We Were Actually Solving

What We Tried First (And Why It Failed)

The Day the Treasure Hunt Engine Buried Itself Alive

The Day the Treasure Hunt Engine Buried Itself Alive

Related reading

A Week in the Life of a Treasure Hunt Engine that Almost Went Off the Rails

The Moment We Realized Our Treasure Hunt Engine Was Lying to Us

The Moment the JVM Tuning Knob Broke Our Treasure Hunt Engine

It Was 2024 When We Tried to Outsmart the Treasure Hunt Engine

The Day We Hardcoded 42 in the Treasure Hunt Engine

Treasure Hunt Engine: Why One Bad Prometheus Rule Sank the Whole Veltrix Event

Related reading

A Week in the Life of a Treasure Hunt Engine that Almost Went Off the Rails

The Moment We Realized Our Treasure Hunt Engine Was Lying to Us

The Moment the JVM Tuning Knob Broke Our Treasure Hunt Engine

It Was 2024 When We Tried to Outsmart the Treasure Hunt Engine

The Day We Hardcoded 42 in the Treasure Hunt Engine

Treasure Hunt Engine: Why One Bad Prometheus Rule Sank the Whole Veltrix Event