We were burning 400ms in p99 tail latency on a core event-processing path in Veltrix. The upstream teams kept blaming the network, but the numbers didnt lie—64% of the time was spent inside the JVM, specifically in sun.misc.Unsafe.park during GC pauses. Every time we hit 80% heap pressure, the throughput collapsed and we lost 300k events per minute. That was the exact moment I stopped believing in the JVM as the runtime and started looking at the system boundary.

The first attempt was aggressively tuned HotSpot with G1GC and pinning the critical threads to their own NUMA nodes. We set -XX:MaxGCPauseMillis=20, -XX:+UseNUMA, and even migrated to Azul Zulu Prime because its handling of large heaps was supposedly better. The p99 dropped to 280ms, but the GC telemetry still showed a sawtooth pattern of 30–40ms spikes every 230ms on a 16GB heap. Profiling with JDK Flight Recorder told us 18% of CPU time was spent in card-table scanning. At that point I knew we were fighting the runtime, not the problem. The event pipeline was small—just JSON parsing, enrichment, and a single RocksDB write—but the JVMs generational collector couldnt stop moving objects.

The architecture decision came during a four-day blackout window after a failed Blue-Green deploy. Three of us sat in a war room with a single Grafana dashboard showing 100% CPU steal time on the Kubernetes nodes. We had two choices: squeeze more life out of the JVM by manually balancing the heap or rewrite the critical hot path in Rust and give the compiler full control over memory layout. The Rust option meant losing the JVM ecosystem (no more async-profiler, no more one-liner heap dumps) but gave us stackless futures, zero-cost abstractions, and compile-time memory safety. We chose Rust. We forked the Cargo.toml wed used in a sidecar for metrics and started porting the event collector.