I got tired of babysitting LLM prompts, so I built a small open source tool to stop.
The pattern that wore me down: every LLM call in a real agent needs a wrapper. Parse the JSON, catch the field that didn't come back, re-prompt, hope it works this time. And every time I swapped models, the prompt I'd spent an afternoon tuning would quietly stop working and I'd tune it again by hand. After enough of that, I stopped treating it as the cost of doing business and started treating it as a thing to fix.
So I made dspyer.
The idea is simple. You wrap an LLM step in a Pydantic schema. When the model returns something that doesn't fit, malformed JSON, a missing field, a citation it made up, dspyer tells the model exactly what was wrong and asks again until it conforms, or stops after however many retries you allow. It's one decorator on a normal typed function. No try/except, no parsing glue.
The part I actually care about is what that buys you. The step compiles down to a standard DSPy module, so instead of hand editing prompts you point a DSPy optimizer at a handful of examples and let it tune them, then save the result and load it in production. That was the whole reason I went down this road. I wanted my prompts to stop being something I maintain.






