The Problem We Were Actually Solving
Veltrix is a treasure-hunt engine that spawns thousands of ephemeral actors—each simulating a player moving through a map, dropping clues, and triggering scoring events. In December 2025 the service ran on Node 20 with the default event-loop concurrency of 4. At 2 500 concurrent players we measured 82 % CPU steal from the cloud provider. p99 latency climbed to 420 ms, and flame graphs showed the yellow blocks—parser, crypto verify, and buffer slice—piling up in the same green heap of libuv. We expected the bottleneck to be the actor logic because thats where the business code lives. Instead, the runtime itself was the payload.
What We Tried First (And Why It Failed)
We started by splitting actors into separate Node processes using the cluster module. At 3 000 players CPU dropped from 82 % to 68 %, but p99 jumped to 950 ms because we were now serializing messages through the OS pipe buffer. Next we tried worker_threads with a shared ArrayBuffer to pass map tiles. This shaved 18 % off CPU, but GC pause times spiked from 3 ms to 37 ms every 120 ms because the V8 heap kept merging young generations. The real kicker was the error rate: during sharp load spikes we saw 30–40 ERR_IPC_CHANNEL_CLOSED per minute because a stray actor could tear down the isolate before responses flushed.






