A friend's product summarizes customer support tickets using a fine-tuned LLM
prompt. It worked perfectly on GPT-4o for six months. Then OpenAI deprecated
4o, the team migrated to GPT-4.1, ran a smoke test in the playground, said
"looks fine," and shipped.
Two weeks later a customer escalated: "Your urgency tagging is wrong on








