When I started this challenge, I thought of an API specification as documentation — something you write after the code works, to tell other people how to call it. By the end, I had completely inverted that view. The OpenAPI specification became the single source of truth that my code had to honour, the script my tests ran from, and the definition my mocks had to stay faithful to. This post walks through how I applied Specmatic's spec-first approach to TRIO, my multi-agent AI assistant, and the things I learned — often the hard way — across several rounds of review.
The project: TRIO
TRIO is a local-first, multi-agent AI assistant I built. The backend is a FastAPI service; the frontend is React. It has a Jarvis-style voice overlay, speech-to-text using Faster-Whisper, text-to-speech via Piper, a ChromaDB-backed memory, and a set of agents coordinated by an agent manager. The backend exposes a focused HTTP API: health and system-info endpoints, agent listing, full CRUD for conversations, a chat endpoint that routes a user message through the agent manager to a local LLM (Ollama), and a voice endpoint that converts text to speech and returns WAV audio.
That chat endpoint — and its dependency on a live LLM — turned out to be the heart of the interesting testing problems, but I'm getting ahead of myself.






