When TestSmith generates tests with --llm, it calls an LLM for every public member of every source file being processed. A project with 20 files and 5 public functions each means up to 100 API calls in a single run. That's a lot of surface area for things to go wrong.

Here's the reliability stack we built, layer by layer.

Layer 1: Retry with Exponential Backoff

LLM APIs fail transiently. Rate limits, timeouts, occasional 5xx responses — all of these are recoverable if you wait and retry. We built a retry middleware that wraps any Provider:

type RetryProvider struct {