The Problem We Were Actually Solving
We needed to support a large player base with thousands of concurrent players for the Treasure Hunt game mode. The game's event-driven architecture meant that every player movement, item pickup, and treasure collection triggered a flurry of events that needed to be processed quickly and efficiently by the server. The catch was that the event bus was prone to congestion, leading to unpredictable delays and stalls.
What We Tried First (And Why It Failed)
Initially, we attempted to mitigate the congestion by introducing multiple event bus instances, each with its own set of event handlers. We also implemented a load balancer to distribute the traffic across multiple servers. However, this setup ultimately led to a "server farm effect," where the load balancer would redirect traffic to a server that was already congested, resulting in an even bigger stall.
In hindsight, we should have recognized that our approach was focused on "distributing the pain" rather than "mitigating it." By spreading the congestion across multiple servers, we were merely delaying the inevitable stall, rather than truly addressing the root cause of the problem.






