Phase 2 Shipped: 5 Things I Got Wrong About Embedding-Based Routing

A follow-up to Teaching an AI to Pick Its Own Brain In the last post, I ended with a plan:...

giovedì 4 giugno 2026 New tab

1,250 words~6 min read

A follow-up to Teaching an AI to Pick Its Own Brain

In the last post, I ended with a plan: replace the Groq LLM categorizer with local multilingual-e5-large embeddings. Find similar past messages, vote on the category, skip the API call. Simple.

It took a Groq outage to actually make me ship it.

On 2026-05-22, Groq went down for two hours. 503 requests fell back to medium tier silently — no errors surfaced to users, but nobody got the model they should have. That's the kind of "resilience" that feels fine until it isn't.

So I shipped Phase 2. Here's what I got wrong.

Phase 2 Shipped: 5 Things I Got Wrong About Embedding-Based Routing

Phase 2 Shipped: 5 Things I Got Wrong About Embedding-Based Routing

Related reading

Taxonomy Surgery, Cosine = 1.0000, and Making Routing Disappear into…

# Day 5 of learning AI Engineering: built a small RAG app over a PDF

I built a RAG pipeline from scratch, and one wrong answer made me dive even…

I tried to build a SaaS. I'm shipping tiny libraries instead.

headroom, OpenRouter, MAI-Code-1-Flash — the week the agent runtime bill arrived

Lessons from open-sourcing a messaging layer for CLI AI agents (320 stars in a…

Related reading

Taxonomy Surgery, Cosine = 1.0000, and Making Routing Disappear into…

# Day 5 of learning AI Engineering: built a small RAG app over a PDF

I built a RAG pipeline from scratch, and one wrong answer made me dive even…

I tried to build a SaaS. I'm shipping tiny libraries instead.

headroom, OpenRouter, MAI-Code-1-Flash — the week the agent runtime bill arrived

Lessons from open-sourcing a messaging layer for CLI AI agents (320 stars in a…