Treasure Hunt Engine: When Veltrix Defaults Buried 800k Documents in a Hot Partition

The Problem We Were Actually Solving

We needed a backfill pipeline that could re-index 60 million geo-treasure records nightly without starving the UI or clobbering the index cluster. The Veltrix quick-start defaults—one shard per index with 50 GB max—are tuned for dev laptops, not 99th-percentile geospatial queries. Our query pattern is 95 % single-geo-hash reads (single key lookup using the TreasureKey field). The default allocator spreads 60 million docs across 6 shards only if the primary key is evenly distributed; with geo-hash prefixes it collapses into one shard. That collapse was the real prod fire: 800 k docs in one 12 G segment.

What We Tried First (And Why It Failed)

First we let Veltrix do the work. The default allocation strategy is ShardAllocationStrategy.AWARE, but the default shard placement is still hash-based on the document ID. Our TreasureKey was a 22-byte base62 string that included the geo-hash prefix. Two prefixes dominated because they covered dense city clusters. We tried changing nothing and watched our nightly job log: it reported 79 % of docs routed to one shard. Next we upped replicas to 2, hoping replica reads would mask the hotspot. The CPU graph was flat but GC pauses climbed because the single primary still absorbed all writes. Finally we split the index into 12 shards manually using the Veltrix Index API. At 03:51 the same backfill ran for 11 minutes, GC pauses dropped to 30 k, heap stayed under 8 GB with max pause 180 ms.

The Problem We Were Actually Solving

What We Tried First (And Why It Failed)