Your schema validation passes and the agent still picks the wrong tool. The bug is semantic.

Pydantic and JSON-schema guarantee the shape of a tool call. They say nothing about whether it was the right call for the user's intent.

TL;DR: We put strict Pydantic validation on every tool call our agent makes, expecting tool-call failures to drop. They barely did. When I categorized 40 logged failures, 31 of them passed schema validation cleanly. They were well-formed calls to the wrong tool, or the right tool with arguments that were valid types but wrong values. Schema validation catches structural errors. Our actual problem was semantic, and the validator is blind to it.

What schema validation actually guarantees

Pydantic checks types, required fields, enums, ranges. A call like cancel_order(order_id="A123") is structurally perfect even when the user asked to cancel a subscription, not an order. The validator passes it. The user is still angry. Shape is not intent.

The 40-failure breakdown

Pydantic and JSON-schema guarantee the shape of a tool call. They say nothing about whether it was the right call for the user's intent.

What schema validation actually guarantees

The 40-failure breakdown

Your schema validation passes and the agent still picks the wrong tool. The bug is semantic.

Your schema validation passes and the agent still picks the wrong tool. The bug is semantic.

Related reading

Tool-Call Accuracy Is Lying to You: A Four-Layer Eval Stack for Agents

Function-calling eval was a 2024 problem. Tool-using agents are the 2026 one.

Why your agentic system doesn't survive its own success — and what a…

Validate your Pydantic schema before the LLM call, not after.

Gemma 4 is the small-model tier agent stacks were waiting for

Diagent: when the static auditor and the sandbox disagree, who's right?

Related reading

Tool-Call Accuracy Is Lying to You: A Four-Layer Eval Stack for Agents

Function-calling eval was a 2024 problem. Tool-using agents are the 2026 one.

Why your agentic system doesn't survive its own success — and what a…

Validate your Pydantic schema before the LLM call, not after.

Gemma 4 is the small-model tier agent stacks were waiting for

Diagent: when the static auditor and the sandbox disagree, who's right?