I Fixed a 5s Database Bottleneck with CDC Dual-Writes
We recently hit a critical bottleneck. While running a schema change on a billion-row order table during peak traffic, our P99 latency spiked to 5 seconds, triggering circuit breakers.
The culprit? MySQL's Online DDL. Even with the INPLACE algorithm, it briefly locks the table metadata to update dictionary files, blocking all incoming writes.
Here is how we solved this using a CDC (Change Data Capture) dual-write strategy and atomic table swapping, bringing P99 latency down to 200ms and achieving zero-downtime schema migrations.
The Architecture






