Schema first, prompt second: valid JSON wasn't enough

Over the last month I've been building Codenames AI, a small web game where an LLM plays Codenames with you. The guesser never sees unrevealed card identities. The server sends the board state and a clue; the model returns structured guesses with confidence scores and short explanations.

When I started, I assumed the hard part was prompting. I was half right. Getting something reasonable out of the model was fast. Making the system safe to expose to players was not.

My first milestone felt responsible: response_format: { type: "json_object" } on the chat completion, plus Zod schemas for the response body. If the JSON didn't parse or failed Zod, retry. Ship it.

Then I watched the model comply perfectly with the schema and still propose moves that would ruin a game.

Valid JSON, invalid game

When I started, I assumed the hard part was prompting. I was half right. Getting something reasonable out of the model was fast. Making the system safe to expose to players was not.

My first milestone felt responsible: response_format: { type: "json_object" } on the chat completion, plus Zod schemas for the response body. If the JSON didn't parse or failed Zod, retry. Ship it.

Then I watched the model comply perfectly with the schema and still propose moves that would ruin a game.

Valid JSON, invalid game

Schema first, prompt second: valid JSON wasn't enough

Schema first, prompt second: valid JSON wasn't enough

Related reading

One good example beat every AI writing rule I wrote

Beyond Prompting: Building a 4-Stage LLM Compiler with Surgical Self-Repair

Speculative Decoding: How LLMs Generate Tokens Faster Without Changing the…

Memory Drift: How I Gamified LLM Context Decay in Next.js

I Got Tired of Claude Code Guessing Wrong, So I Built an MCP Toolkit

The Code AI Won't Write

Related reading

One good example beat every AI writing rule I wrote

Beyond Prompting: Building a 4-Stage LLM Compiler with Surgical Self-Repair

Speculative Decoding: How LLMs Generate Tokens Faster Without Changing the…

Memory Drift: How I Gamified LLM Context Decay in Next.js

I Got Tired of Claude Code Guessing Wrong, So I Built an MCP Toolkit

The Code AI Won't Write