A team's "automatic retry on failure" pattern caused a customer-facing incident. The LLM's first call had succeeded but the team's downstream connection had timed out. The retry produced a duplicate output. The user saw the same email twice. Compounding the original failure.
Naive retries make failures worse. Engineered retries with idempotency, backoff, and budgets are tools. The difference is engineering discipline.
Exponential backoff with sense
The pattern that survives:
- First retry: short delay (~100ms).
- Subsequent retries: exponential backoff with jitter.
- Maximum retries: defined.
- Maximum total time: defined.
Without backoff, retries hammer providers during their issues, making the issue worse for everyone. With sensible backoff, retries are friendly.
Idempotency
For LLM calls with side effects (any tool call that changes state), idempotency keys:
- Each logical operation has a unique key.
- Provider deduplicates on the key.
- Retries return the original result.
LLM calls themselves are usually idempotent. Tool calls are usually not. The retry policy must distinguish.
Retry budgets
Every operation has a retry budget:
- Per-operation: max 3 retries, max 10 seconds total.
- Per-task: max 5 total retry events.
- Per-user: rate-limited.
Without budgets, an unstable provider can cascade into runaway retry storms. With them, the cost is bounded.
Reviewer signal
When retries spike, the team sees:
- Which operation is retrying.
- What error type is causing retry.
- Whether retries are eventually succeeding.
If retries spike and fail, the upstream issue surfaces. If they spike and succeed, the upstream is unstable but recoverable. Both are useful signals.
A real retry library
A team's retry pattern, abstracted:
@retry(
max_attempts=3,
backoff_seconds=lambda n: min(2 ** n, 10) + random.uniform(0, 1),
retry_on=(ProviderError, TimeoutError),
idempotency_key=lambda req: req.idempotency_key,
)
def llm_call(req):
...
Used consistently across the codebase. Easy to read, easy to verify. Sensible defaults.
What we won't ship
Open-ended retries without budgets.
Retries on operations with side effects without idempotency keys.
Retries that don't change anything. Same prompt, same model, same temperature → same result. If the failure was the LLM's output, change something or surface.
Hidden retries that aren't visible in observability.
Close
Retry strategies are engineering tools. Backoff prevents cascades. Idempotency prevents duplicates. Budgets prevent runaway. Without these, retries make the problem worse. With them, retries are part of a reliable system.
Related reading
- Tool failure modes — companion engineering.
- Cost guardrails — budget discipline.
- Probabilistic with deterministic contracts — surrounding pattern.
We build AI-enabled software and help businesses put AI to work. If you're tightening retry discipline, we'd love to hear about it. Get in touch.