TL;DRAI

Across 40 test outputs from five 7B models, typography artifacts appeared exclusively in cloud models (ChatGPT, Claude)—absent from Llama, Qwen, and Mistral. For infrastructure teams, this signals lower post-processing costs with self-hosted LLMs but indicates output behavior divergence affecting pipeline design.

llmclean is a tiny zero-dependency library I maintain for cleaning the noise out of raw LLM output. v0.2.0 was a "what production traffic taught me" release — every fix came from a real break in one of my own pipelines.

0.3.0 is a different kind of release. This time I had a list of features I was fairly sure I needed, sourced from what people keep complaining about and re-implementing by hand: strip the <think> reasoning blocks, kill the em-dashes and smart quotes, remove the zero-width characters, flatten the markdown for text-to-speech.

Before writing any of it, I did something I should have done the first time: I checked whether the models I care about actually produce that mess. I ran eight generative prompts across five local models — Llama 3.1, Gemma 4, Qwen 2.5, DeepSeek-R1, Mistral, all 7–8B instruct — and measured what came out. Forty generations, one diagnostic pass each.

Three of my assumptions were wrong.

1. Local models barely produce the typography mess at all

dev.to

I almost added an em-dash remover to my LLM library. Then I tested whether local models even produce em-dashes.

How a five-model local sweep reshaped llmclean 0.3.0 — and three things about LLM output that turned out to be the opposite of what I assumed.

domenica 21 giugno 2026 New tab

TL;DRAI

1,060 words~5 min read

Three of my assumptions were wrong.

1. Local models barely produce the typography mess at all

I almost added an em-dash remover to my LLM library. Then I tested whether local models even produce em-dashes.

I almost added an em-dash remover to my LLM library. Then I tested whether local models even produce em-dashes.

Related reading

LLM 0.32a0 is a major backwards-compatible refactor

llama-dash - Local LLM Ops

Running Local LLMs With Ollama For Private Development

I built a version manager for llama.cpp using nothing but vibe coding.

Stop scattering LLM SDK/API calls across your codebase. Here is the 2-file rule…

Stop Your LLM From Getting Owned

Related reading

LLM 0.32a0 is a major backwards-compatible refactor

llama-dash - Local LLM Ops

Running Local LLMs With Ollama For Private Development

I built a version manager for llama.cpp using nothing but vibe coding.

Stop scattering LLM SDK/API calls across your codebase. Here is the 2-file rule…

Stop Your LLM From Getting Owned