Sandeep Shivam is an Associate Director at Tavant, building AI-powered lending products that improve efficiency and customer experience.getty​During a prospective client engagement at a financial institution, I learned that they had spent four months building an AI agent for a consumer-facing workflow. The team was experienced, the budget was real and the model selection seemed obvious: Pick the strongest large language model (LLM) available and build the agent around it.​The pilot looked promising. The production rollout did not. Latency on consumer journeys crept past acceptable thresholds. Inference costs climbed faster than the value created. Accuracy was uneven across the workflow. After four months, the initiative was paused and redesigned. Not because the technology was wrong, but because the architecture was that is when my team was engaged to redesign it. That story​ is becoming common, with the ​widely cited report from MIT's NANDA initiative last year finding that 95% of enterprise generative AI pilots are failing to deliver measurable financial impact.From my experience, while the model is often the first suspect for pilots stalling, the architecture is the more likely culprit. In most cases, a production AI agent should not be one model doing everything. It should be a coordinated system of specialized models, each doing what it is best suited to perform.The Kitchen Brigade AnalogyThink about how a serious restaurant kitchen works. There is no single chef preparing every dish. There is a brigade. The saucier handles sauces. The grill station handles proteins. The pastry chef handles desserts. The expediter coordinates the pass. Each role exists because no individual, no matter how talented, can deliver speed, precision and consistency across every cuisine technique at once.A production AI agent should be designed the same way. Assigning every task to one large model is the equivalent of asking your head chef to also plate desserts and pour wine. It can be done, but it produces a slower, more expensive and less reliable kitchen.Why One Model Breaks Down in ProductionConsider a typical financial services workflow. A customer uploads a document. The agent must read it, extract data, classify the intent, validate it against policy, query system records, plan the next action, execute an API call, verify the outcome and respond clearly. That is a chain of very different tasks. Reading a document needs visual understanding. Intent classification needs speed. Policy lookup needs retrieval and ranking. Workflow planning needs reasoning. Tool execution needs a structured output. Exception handling sometimes needs deeper verification.​A single large model can technically do all of these. But it does them at the cost of the most expensive resource you have, and at the speed of the slowest path through the model. In consumer-facing financial workflows, where seconds matter and unit economics are scrutinized, that combination is not survivable.You don't scale AI by picking a bigger model. You scale it by picking the right one for each job.Specialized Models, Specialized RolesThe next phase of agentic AI will be built on composition, not just model size. This does not mean models operate independently. A central orchestrator is required to route each task, hold shared context and enforce policy at every step.​Oversight then runs through one control plane: every call logged, every decision traceable, fallbacks triggered when a model is unavailable or returns low confidence. The customer sees one agent. Underneath, a coordinated team is at work.The architecture I now recommend has clear roles for each model class:​• Small language models, or SLM, often under ten billion parameters, are suited to high-volume work such as intent classification, routing and preprocessing. When the inputs are well-defined and structured—capturing application details, classifying a service request or normalizing user input—an SLM handles them quickly and cheaply. Routing this through a large model can add latency and cost for no real benefit.• Vision-language models belong in the perception layer, where documents, statements and forms must be interpreted. In any workflow built on paperwork— statements, identity documents, contracts, invoices, claims forms—a vision-language model can extract structured data directly, far more reliably than a text-only model working off raw OCR output.• General-purpose large language models (LLMs) retain their place at the heart of the agent, doing the reasoning and orchestration they were designed for. When a customer asks an open-ended question, or when the agent must coordinate steps across multiple downstream systems, a general-purpose LLM is often the right tool. It should be invoked deliberately, not as the default for every keystroke.• Reasoning-optimized models earn their cost on the smaller set of decisions where being wrong has real consequences like anomaly review, exception handling and compliance-sensitive paths. Places where careful thought is worth more than a faster, shallower answer.• Action-oriented models can handle reliable tool use, producing structured calls and workflow actions rather than free-form text that hopes to land on a valid API request. Submitting transactions, triggering downstream services or updating records through enterprise systems are cases where a model tuned for structured tool calls is more dependable than a general LLM trying to compose the same call.​What Leaders Should Do NextTo get started with this architecture, map your most important agent workflow end-to-end and ask which model handles each step. If one model is doing everything, you've found your optimization opportunity.Be on the lookout for the common offenders of model mismatch: large models on simple classification, text models reading documents and unstructured generation driving API calls.​Once you've found your opportunities, define clear roles per model, evaluate on workflow outcomes rather than benchmark scores and bake governance into the architecture from day one: routing, logging, fallbacks, audit trails.​The early era of agentic AI was shaped by the belief that a more powerful model could solve every problem. That belief is being reconsidered where economics and reliability matter more than benchmark wins. The next era belongs to composition with agentic architectures built the way great kitchens are run: clear roles, specialized capabilities, strong coordination, measurable accountability. ​Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?