If your LLM costs are climbing, the instinct is almost always the same: swap to a cheaper model. GPT-4 to GPT-4-mini. Claude Opus to Claude Haiku. Sometimes that helps a little. It rarely fixes the actual problem.
The actual problem, in most workflows I've looked at, is that every step gets routed through the LLM, even the steps that don't need language reasoning at all.
This post breaks down a simple mental model for deciding what should and shouldn't touch an LLM, with a working example you can adapt.
The four components of any AI workflow
Every automated workflow — whether it's a support ticket router, a fraud check, or a content pipeline — is built from some combination of four building blocks. They get treated the same once a workflow diagram is drawn flat, but they have wildly different cost and latency profiles.







