Over the last month I've been building Codenames AI, a small web game where an LLM plays Codenames with you. The guesser never sees unrevealed card identities. The server sends the board state and a clue; the model returns structured guesses with confidence scores and short explanations.
When I started, I assumed the hard part was prompting. I was half right. Getting something reasonable out of the model was fast. Making the system safe to expose to players was not.
My first milestone felt responsible: response_format: { type: "json_object" } on the chat completion, plus Zod schemas for the response body. If the JSON didn't parse or failed Zod, retry. Ship it.
Then I watched the model comply perfectly with the schema and still propose moves that would ruin a game.
Valid JSON, invalid game






