Kafka's Real Compression Problem Is Batch Depth

Kafka compression waste is usually a batch depth problem, not a codec problem. Better batching improves producer compression, which reduces consumer CPU and cross-AZ cost downstream.

In one production deployment, changing batch sizing and linger settings cut the consumer fleet in half and moved compression from under 10% to over 50% - with no codec change. The cause wasn't the codec. It was batch depth.

Why batch depth controls what the codec sees

Kafka producers compress batches, not individual messages. The compression codec sees whatever the producer has accumulated by the time it flushes. linger.ms sets how long the producer waits to accumulate records. batch.size caps how large that accumulation can grow.

Both settings are conservative by default. When per-producer throughput is low - because traffic is light, or because it's spread across too many producer instances - the linger window closes before much data has arrived.

Kafka compression waste is usually a batch depth problem, not a codec problem. Better batching improves producer compression, which reduces consumer CPU and cross-AZ cost downstream.

Why batch depth controls what the codec sees

Kafka's Real Compression Problem Is Batch Depth

Kafka's Real Compression Problem Is Batch Depth

Related reading

How to analyze the cost of Kafka?

Kafka Partitioning Strategies: How to Get It Right Before It Costs You

A Deep Dive Into File Compression: How Data Gets Smaller, Why Codecs Differ,…

Kafka is not a queue — and treating it like one will wreck your system

Building a Real-Time Translation Pipeline with Kafka and Event-Driven…

Why Your Kafka Stack Is Holding You Back (And How to Fix It)

Related reading

How to analyze the cost of Kafka?

Kafka Partitioning Strategies: How to Get It Right Before It Costs You

A Deep Dive Into File Compression: How Data Gets Smaller, Why Codecs Differ,…

Kafka is not a queue — and treating it like one will wreck your system

Building a Real-Time Translation Pipeline with Kafka and Event-Driven…

Why Your Kafka Stack Is Holding You Back (And How to Fix It)