How we migrated a live routing system using AI-assisted refactoring

When the storage backend for Stream Router hit hard limits, we needed to redesign its data model and migrate it to a new storage architecture without disrupting live production traffic. We would not have completed the implementation in the time frame we had without AI tools.

We used Claude and Cursor to accelerate a systematic, test-driven refactoring process. They weren’t generating code autonomously: For each method, we provided the old implementation, the new schema, and a failing test. The models would generate a first pass, and the tests told us whether it was correct.

We were curious whether AI could help us safely evolve a critical production system. This post is about what worked, what didn’t, and what we learned along the way. We’ll walk through the migration itself, the workflow we used, what gave us confidence in the migration, and where the models were useful versus where they still required human expertise.

Before we get into the migration, it’s worth understanding the system we were changing.

At Datadog, we ingest massive volumes of metrics data every second as part of a platform that processes over a hundred trillion events per day. Routing that data correctly is just as important as ingesting it. Every datapoint then needs to be routed to the right Kafka cluster, topic, and set of partitions so it can be stored and queried correctly, and those routing decisions are constantly changing as our infrastructure evolves. (For a deeper look at the full metrics pipeline, see our overview of the metrics platform.)

Before we get into the migration, it’s worth understanding the system we were changing.

How we migrated a live routing system using AI-assisted refactoring | Datadog

How we migrated a live routing system using AI-assisted refactoring | Datadog

Other newsrooms on this story

Related reading

Migrating Data Ingestion Systems at Meta Scale

Our AI Inference Bill Dropped 65% After We Stopped Treating Every Query the Same

Approaching your observability migration with the right mindset | Datadog

I Built a Production-Grade AI Gateway in Rust — Here's What I Learned

How I Prevented Claude Code from Breaking My Architecture with 18 Tests That…

Modern Data Stack Migration — Day 1: Scaling to 8+ Companies with DRY…