If you give an LLM agent a table of A/B variants and ask "which one should we send next?", it will confidently pick the one with the highest conversion rate.

That feels right. It is often wrong.

The model has no concept of sample size, exploration, or regret. It pattern-matches "biggest number = winner" and moves on. For a one-off question, fine. But inside an agent loop that picks a variant on every request — email subject lines, ad copy, model routing, recommendation ranking — that naïve pick quietly accumulates regret and starves the options it never gave a fair chance.

The fix isn't a better prompt. It's to not ask the LLM to do the math at all. Route the decision to a real bandit algorithm and let the model do what it's good at (orchestration, language) while a deterministic solver does what it's good at (the optimization).

This post is a copy-paste demo you can run in your terminal right now, no signup, no API key. I'll use OraClaw — a deterministic decision-intelligence MCP server — but the point stands regardless of tool: stop letting the model guess at math it can verify.