llmclean is a tiny zero-dependency library I maintain for cleaning the noise out of raw LLM output. v0.2.0 was a "what production traffic taught me" release — every fix came from a real break in one of my own pipelines.
0.3.0 is a different kind of release. This time I had a list of features I was fairly sure I needed, sourced from what people keep complaining about and re-implementing by hand: strip the <think> reasoning blocks, kill the em-dashes and smart quotes, remove the zero-width characters, flatten the markdown for text-to-speech.
Before writing any of it, I did something I should have done the first time: I checked whether the models I care about actually produce that mess. I ran eight generative prompts across five local models — Llama 3.1, Gemma 4, Qwen 2.5, DeepSeek-R1, Mistral, all 7–8B instruct — and measured what came out. Forty generations, one diagnostic pass each.
Three of my assumptions were wrong.
1. Local models barely produce the typography mess at all






