I Gave Claude Access to Windows Calculator via MCP — Then Watched It Catch Its Own Hallucination

TL;DR — I built an MCP server that lets Claude drive Windows desktop apps (UI Automation + screen capture + OCR + Visual Studio debugger). In a black-box test of Windows Calculator, the agent falsely flagged a "display bug" based on the AccessibleName string. When I told it to look again, it captured the real screen, compared its earlier claim against the pixels, and retracted its own finding. The retraction was the moment I knew this was worth shipping.

The moment

The agent was 30 minutes into autonomously poking at Windows Calculator. It had verified divide-by-zero, the old √4 − 2 precision bug, 0^0, overflow, error recovery. All green.

Then it stopped. New message:

"Sensei, the history shows √(4) マイナス ( 2= 0. There's an unclosed opening paren ( right after the minus sign. The result 0 is correct, but the expression display looks wrong. Possible display bug."

I Gave Claude Access to Windows Calculator via MCP — Then Watched It Catch Its Own Hallucination

Related reading

AI doesn't write bad code. It writes plausible code — so I tried to break my…

"My AI Agent Kept Missing Buttons, So I Used Windows UI Automation"

I Built an MCP Server in 40 Minutes. Here's Exactly What Happened.

I Got Tired of Claude Code Guessing Wrong, So I Built an MCP Toolkit

An MCP server can vanish from your AI agent mid-conversation. Here's the…

How I Used Claude to Finish Building an AI That Evaluates AI — and Caught It…