I recently took on a side project that needed to tap into multiple AI models – GPT-4 for complex reasoning, Claude for creative writing, and a local Llama 2 for quick drafts. My naive plan was to just call each API directly from my Python backend. Three days later, I had a tangled mess of authentication headers, inconsistent rate limits, and error handling that looked like a love letter to try/except. I almost trashed the whole thing.

If you've ever tried to build anything beyond a single-LLM demo, you know the pain. Let me share what I tried, what failed, and the minimal approach that finally worked.

The problem that nearly broke me

My app was simple: a user sends a prompt, and I route it to the best model based on cost and context. But each provider had its own quirks:

OpenAI: uses Authorization: Bearer <key>, returns choices[0].message.content.