I asked 4 senior Kafka engineers this question on Reddit. Nobody named a tool. So I built one.

You deployed a hotfix at 9:45 PM.

The fix was correct. The status field had been accepting invalid values — "ok", "done", "finished" — from different teams. Converting it to a strict Enum with four valid values was the right call: PENDING, PROCESSING, COMPLETED, FAILED.

Nobody checked the DLQ first.

There were 23,000 payment events sitting in payments.dlq from a consumer failure that afternoon. Your new V2 consumer tries to process them. Each one fails immediately: