Phase 2 of the de-swarm project — how I turned a 3B text-to-SQL model into a production API for $0.

The setup

Three weeks ago, I distilled a 120B+ text-to-SQL pipeline into a 3B QLoRA fine-tune of Qwen2.5-Coder-3B-Instruct. The model hit 90% in-domain accuracy and 55.5% on Spider, ran on a laptop CPU via Ollama, and cost $0 to train and $0 to inference. I wrote about it in Phase 1.

But "I have a model that runs on my laptop" is a different category of deliverable than "I have an API anyone can call." The first is a research artifact. The second is a product.

Phase 2 was about crossing that gap. This post is the story.