A friend's product summarizes customer support tickets using a fine-tuned LLM

prompt. It worked perfectly on GPT-4o for six months. Then OpenAI deprecated

4o, the team migrated to GPT-4.1, ran a smoke test in the playground, said

"looks fine," and shipped.

Two weeks later a customer escalated: "Your urgency tagging is wrong on