Here's the number that should worry you more than it does: an agent that calls the right tool with the right arguments 95% of the time completes an eight-step task correctly only about 66% of the time. Reliability doesn't fail in one dramatic crash. It leaks. Every step is a coin that lands heads 19 times out of 20, and you're flipping it eight times in a row.

The good news is that most of that leak isn't the model being dumb. It traces to two things you control completely: the JSON schema you hand the model, and whether you let it guess when it shouldn't. Fix those two and the per-call rate climbs — and because it compounds, small gains pay off hugely.

Your schema is a prompt, not documentation

This is the reframe that fixes everything downstream. When you define a tool, the description fields aren't docs for your teammates. They are the only instructions the model gets about when and how to use that tool. The model never sees your implementation. It sees the schema. That's it.

So a schema like this is not "good enough":