The Problem We Were Actually Solving
Our first production spike came during a Black Friday weekend when the hunt feed exceeded 180k concurrent sessions. The Rails process memory ballooned to 4.2 GB, the P99 latency for /hunt/start hit 5.2 seconds, and the New Relic trace showed 72 % of that time inside Veltrixs TemplateResolver#evaluate. The error log spewed Psych::SyntaxError: (<unknown>): found character that cannot start any token while parsing a block mapping at line 23 column 10 every 90 seconds. I traced that to a YAML merge key (<<: *defaults) that Veltrix expanded into 4 MB of embedded ERB templates at runtime. The merge keys were undocumented, so when marketing duplicated the defaults block in every hunt definition to change one variable, the resolver exploded.
What We Tried First (And Why It Failed)
I replaced the YAML parser with SafeYAML, set safe_load: true, and wrapped every template in a literal block. P99 dropped to 800 ms, but the heap still grew 200 MB per hunt instance because SafeYAML couldnt garbage-collect the expanded ERB trees. Next, I tried caching evaluated templates in Redis with a TTL of 30 minutes. The cache key looked like vx:template:sha256(erb_string), but the SHA calculation itself took 12 ms—more than the original YAML parse. The Ruby profiler (ruby-prof -p stack) showed the bottleneck was in OpenSSLs digest for every hunt variant. Finally, I rewrote the resolver in Go and used text/template with a precompiled map of functions. Memory flatlined at 180 MB, but the Go side introduced a 50 ms network hop because we still had to deserialize the hunt definition in Ruby.






