Semantic caching our flaky-test summariser: 58% fewer LLM calls

TL;DR: Our internal flaky-test summariser at Buildkite was firing ~40k LLM calls a day, and most were near-duplicates of failures we'd already explained. Switching on semantic caching in Bifrost cut live provider calls by 58% and dropped p50 latency on cache hits from ~900ms to about 40ms. It also kept the feature alive when our primary provider browned out for 11 minutes.

The feature that wouldn't shut up

On our platform team (eight of us) we shipped a small thing last quarter: when a test goes flaky in a Buildkite pipeline, we pass the failure output to an LLM and stick a plain-English summary on the build page. Devs liked it. The provider bill less so.

By March it was making roughly 40,000 calls a day against anthropic/claude-haiku, with openai/gpt-4o-mini as the fallback. p50 latency sat around 900ms. The monthly bill crept past $310. Not catastrophic. But the calls were doing the same work over and over.

Why the calls were so repetitive

The feature that wouldn't shut up

Why the calls were so repetitive

Semantic caching our flaky-test summariser: 58% fewer LLM calls

Other newsrooms on this story

Semantic caching our flaky-test summariser: 58% fewer LLM calls

Other newsrooms on this story

Related reading

Fault-injecting our LLM provider to trust Bifrost fallbacks

Semantic caching the VLM step in our product-photo pipeline

We Cut Our LLM API Bill 30% With Four Lines of YAML

Async LLM inference in CI: stop build workers blocking on slow jobs

How I Cut My AI Bill by Caching LLM Responses in Node.js

Prefix caching at scale: when it saves you 80% of prefill cost, and the…

Related reading

Fault-injecting our LLM provider to trust Bifrost fallbacks

Semantic caching the VLM step in our product-photo pipeline

We Cut Our LLM API Bill 30% With Four Lines of YAML

Async LLM inference in CI: stop build workers blocking on slow jobs

How I Cut My AI Bill by Caching LLM Responses in Node.js

Prefix caching at scale: when it saves you 80% of prefill cost, and the…