I Was Wrong About Events for Three Years—Until I Learned What Async Runtime Was Really Costing

Three years ago I inherited a Rust-based treasure-hunt engine that processed upwards of 2 million in-game events per second. Latency was fine on the bench—median 3 ms, p99 12 ms—but every time we hit 2.5 M events the JVM control-plane ground to a halt at 30 % CPU and 512 MB RSS. I blamed GC pauses, tuned G1, disabled safepoints, even rewrote the hot path in C. Nothing mattered.

Then I ran the collector under perf for 30 seconds and saw a 3.4 ms pause where 92 % of threads blocked on syscalls inside tokio::park_timeout. The async runtime was blocking the executor thread on the event queues io_uring submission path. In my head Rust equaled no GC, therefore performance was only about the algorithm. The runtime was the constraint.

The Problem We Were Actually Solving

We were building a real-time open-world game where every player action—move, rotate, shoot, loot—generated an event. Those events streamed into a Kafka topic partitioned by shard (hot zones used 64 partitions). A cluster of Rust services consumed the topic, fed an in-memory state machine, then published updates to a second topic for the physics and rendering workers.

The system promised 2 ms end-to-end latency at 2 M events/sec. Early load tests with 1.5 M events/sec showed median 2.8 ms and p99 14 ms. When we pushed to 2.2 M events/sec the metrics inverted: median 1.9 ms but p99 exploded to 124 ms and 5 % of responses timed out. Collectors on the consumer pods showed CPU flat at 65 %, but RSS climbed 1.2 GB every minute and the JVM control-plane nodes rebooted with OutOfMemoryError.

The Problem We Were Actually Solving

I Was Wrong About Events for Three Years—Until I Learned What Async Runtime Was Really Costing

I Was Wrong About Events for Three Years—Until I Learned What Async Runtime Was Really Costing

Related reading

This Rewrite Isnt the Constraint: How a 300ms Tail Latency Hunt Led to a New…

The Day We Realized Events Were the Bottleneck (And Why We Moved to Rust)

The Moment the Config Parser Became the Bottleneck

The Moment the JVM Tuning Knob Broke Our Treasure Hunt Engine

Why I Ditched Go for Rust in Our Real-Time Event Processing Pipeline

When the Runtime Was the Wall: How Rust Broke a 50 ms SLA and Saved the Day

Related reading

This Rewrite Isnt the Constraint: How a 300ms Tail Latency Hunt Led to a New…

The Day We Realized Events Were the Bottleneck (And Why We Moved to Rust)

The Moment the Config Parser Became the Bottleneck

The Moment the JVM Tuning Knob Broke Our Treasure Hunt Engine

Why I Ditched Go for Rust in Our Real-Time Event Processing Pipeline

When the Runtime Was the Wall: How Rust Broke a 50 ms SLA and Saved the Day