Most async bugs announce themselves. This one didn't.

No failed jobs. No customer complaints. No error logs. Just infrastructure costs climbing steadily with no obvious cause. It took correlating message IDs across logs to finally see it: the same message being processed two, sometimes three times per delivery.

The culprit was a race condition hiding inside an acknowledgment pattern.

What Happened

A consumer picked up a message and started doing work. That work took time. Before it finished, the queue's retry timeout fired, assumed failure, and redelivered the message to a second consumer. Now two workers were doing identical work concurrently, both completing successfully, both silently doubling the cost.