Here's how the story usually goes. Saturday afternoon, you wire a language model to a mailbox for the first time. You type "summarize my unread mail" and watch it actually happen — the model scans, picks out the thread from your landlord, nails the summary. Magic. Sunday morning, drunk on possibility, you add a send capability. Sunday evening, you're reading a transcript where a newsletter's footer text nearly convinced the model to forward something it shouldn't, and you quietly remove the send tool until you understand what just happened.
The gap between Saturday and Sunday is the actual engineering of an AI email assistant. The model can't touch a mailbox on its own — you give it tools: small server-side functions that wrap email endpoints, run when the model asks, and hand results back. The model decides; your code acts. Getting that boundary right is the whole game.
Three tools is enough
The pattern works identically for ChatGPT, Claude, or any model with function calling — a tool is a JSON schema with a name, description, and typed parameters. Define three: list_messages, get_message, send_email. The descriptions are what the model reasons over, so write them like instructions, and keep parameter counts low — models pick correctly from 3 to 5 fields far more reliably than from 15.






