We Trusted Auto-Ack. The Queue Agreed. Our Costs Didn't.

Most async bugs announce themselves. This one didn't.

No failed jobs. No customer complaints. No error logs. Just infrastructure costs climbing steadily with no obvious cause. It took correlating message IDs across logs to finally see it: the same message being processed two, sometimes three times per delivery.

The culprit was a race condition hiding inside an acknowledgment pattern.

What Happened

A consumer picked up a message and started doing work. That work took time. Before it finished, the queue's retry timeout fired, assumed failure, and redelivered the message to a second consumer. Now two workers were doing identical work concurrently, both completing successfully, both silently doubling the cost.

Most async bugs announce themselves. This one didn't.

The culprit was a race condition hiding inside an acknowledgment pattern.

What Happened

We Trusted Auto-Ack. The Queue Agreed. Our Costs Didn't.

We Trusted Auto-Ack. The Queue Agreed. Our Costs Didn't.

Related reading

Receipts beat scheduled optimism

My cron job was silently failing on Cloudflare. The bug wasn't where I looked.

I Shipped a Bug to Production That Cost Us 3 Hours of Downtime

5 Claude API Errors That Cost Me Money (And How I Trapped Them)

Race-Condition: How a Single SQL Line Eliminated 100 Lines of Retry and Lock…

How We Stopped Losing 45 Minutes Every Time Production Broke

Related reading

Receipts beat scheduled optimism

My cron job was silently failing on Cloudflare. The bug wasn't where I looked.

I Shipped a Bug to Production That Cost Us 3 Hours of Downtime

5 Claude API Errors That Cost Me Money (And How I Trapped Them)

Race-Condition: How a Single SQL Line Eliminated 100 Lines of Retry and Lock…

How We Stopped Losing 45 Minutes Every Time Production Broke