If you're building AI products on top of closed models, anyone with an API key can get similar capabilities. Lasting differentiation comes from proprietary data, the training recipe, the infrastructure, and the speed of iteration.

Shopify has something most companies don't: a product surface where millions of merchant interactions directly signal whether the model's output is any good. That feedback loop is the foundation, but only if you keep learning from it.

We fine-tuned a tool-calling agent to turn natural language into a Shopify Flow for Sidekick, our AI commerce assistant. It's 2.2x faster, 68% cheaper, and outperforms closed models.

Along the way, we found lessons no paper warned us about. Data preprocessing decisions, from representation design to formatting details, that compound to swing accuracy by double digits. Silent infrastructure failures that degrade your model with zero warnings and take days to trace. Benchmark parity that masks a 35% gap once real users show up.

This post covers the problems we faced, how we fixed them, and what to look for if you're doing the same.