Making LLM Calls Reliable: Retry, Semaphore, Cache, and Batch

When TestSmith generates tests with --llm, it calls an LLM for every public member of every source...

sabato 23 maggio 2026 New tab

937 words~4 min read

When TestSmith generates tests with --llm, it calls an LLM for every public member of every source file being processed. A project with 20 files and 5 public functions each means up to 100 API calls in a single run. That's a lot of surface area for things to go wrong.

Here's the reliability stack we built, layer by layer.

Layer 1: Retry with Exponential Backoff

LLM APIs fail transiently. Rate limits, timeouts, occasional 5xx responses — all of these are recoverable if you wait and retry. We built a retry middleware that wraps any Provider:

type RetryProvider struct {

Making LLM Calls Reliable: Retry, Semaphore, Cache, and Batch

Making LLM Calls Reliable: Retry, Semaphore, Cache, and Batch

Related reading

I Gave 13 LLMs the Same Codebase and Asked for a Specification. Six Ran on My…

Don't Trust Your LLM's Safety Promises Across Runtimes

Stop scattering LLM SDK/API calls across your codebase. Here is the 2-file rule…

My LLM API Calls Were Failing Silently. Here's the Logging Setup I Wish I Had…

The Auditor's AI Workflow: How I Use LLMs Without Trusting Them

The LLM API Failure Policy I Wish I Had Before My First Production Incident

Related reading

I Gave 13 LLMs the Same Codebase and Asked for a Specification. Six Ran on My…

Don't Trust Your LLM's Safety Promises Across Runtimes

Stop scattering LLM SDK/API calls across your codebase. Here is the 2-file rule…

My LLM API Calls Were Failing Silently. Here's the Logging Setup I Wish I Had…

The Auditor's AI Workflow: How I Use LLMs Without Trusting Them

The LLM API Failure Policy I Wish I Had Before My First Production Incident