I Refactored 100 Functions With Claude. CI Was Green. Production Got Slower in 7 Spots.

I asked Claude Code to refactor 100 functions across a Python service I owned. It did the job in two passes. CI was green on both. The PR description was so neat I almost felt bad shipping it on a Friday.

Two weeks later, on-call paged me because the p95 of one endpoint had drifted from 180 ms to 240 ms. I started bisecting. The bisect landed on the refactor PR. I started reading the refactor PR. Seven of the 100 functions were slower in production. CI never noticed because CI does not measure "slower." It measures "returns the same value."

This post is about what those seven slow functions had in common, why mutation tests and unit tests both missed them, and the four checks I now run before I let Claude, or any AI, refactor anything that ships under load.

The setup, so you can tell whether this generalizes

The codebase: a Python 3.12 service with about 18k lines of business logic, FastAPI on the edge, asyncpg to Postgres, a Redis cache, and a CPU-bound scoring module that runs on every request. The 100 functions were a curated batch: small to medium, pure where possible, all with unit tests. I asked Claude Code to apply a standard set of cleanups: early returns, extracted variables for magic numbers, comprehensions where loops did one thing, dataclass conversions for ad hoc tuples.

The setup, so you can tell whether this generalizes

I Refactored 100 Functions With Claude. CI Was Green. Production Got Slower in 7 Spots.

I Refactored 100 Functions With Claude. CI Was Green. Production Got Slower in 7 Spots.

Related reading

I built a local MCP server that gives Claude Code real PR context — 33s reviews…

Claude Code in Production: The Guardrails Nobody Talks About (Until Something…

Four agents, 77 projects, 90 minutes: the multi-agent Claude Code pattern I run…

What I Got Wrong About Claude Code (And How I Fixed It)

Seven PRs Before Lunch: Parallel Claude Code Tabs Plus Audit-Before-Bump

How a one-person studio writes 35 Claude Code agents that don't fight each other

Related reading

I built a local MCP server that gives Claude Code real PR context — 33s reviews…

Claude Code in Production: The Guardrails Nobody Talks About (Until Something…

Four agents, 77 projects, 90 minutes: the multi-agent Claude Code pattern I run…

What I Got Wrong About Claude Code (And How I Fixed It)

Seven PRs Before Lunch: Parallel Claude Code Tabs Plus Audit-Before-Bump

How a one-person studio writes 35 Claude Code agents that don't fight each other