I've been using LLMs to classify GitHub pull requests into changelog categories. The goal: automatically decide if a PR is a feature, bugfix, breaking change, or internal noise.
It took several iterations to get consistent output. Here's what actually worked.
The problem with direct classification
The naive approach:
Classify this PR: feature / bugfix / breaking / internal.








