Agent Retries Non-Retryable Errors — Wastes Tokens on Guaranteed Failures
Symptom
- Agent retries a request that fails with 400 Bad Request 5 times before giving up
- Each retry is identical to the previous — same request, same error, same failure
422 Unprocessable Entityresponse has clear validation error — agent ignores it and retries"error": "model_not_found"retried 3 times — model name is wrong and will never exist- Agent logs show: attempt 1/5 failed, attempt 2/5 failed… (all with same error code)
- Token cost of 5 failed attempts = 5× the cost of failing once and escalating
Root Cause
Retry logic treats all errors as transient. A 503 Service Unavailable is transient — the server might recover. A 400 Bad Request means the request itself is malformed — retrying the identical request will always get the same result. Mixing retryable and non-retryable errors in a single retry block wastes the retry budget and delays the agent from recognizing that it needs to change its approach, ask for help, or escalate.
Fix
Option 1: Classify errors before retrying
import httpx
from enum import Enum
class ErrorClass(Enum):
RETRYABLE = "retryable" # Transient — retry with backoff
NON_RETRYABLE = "non_retryable" # Permanent — don't retry, fix the request
RATE_LIMITED = "rate_limited" # Quota — wait then retry
AUTH_FAILURE = "auth_failure" # Token issue — refresh then retry
def classify_http_error(response: httpx.Response) -> ErrorClass:
"""
Classify HTTP error to determine retry strategy.
"""
status = response.status_code
# Non-retryable: request is wrong, retrying won't help
if status in (400, 404, 405, 406, 409, 410, 413, 422, 451):
return ErrorClass.NON_RETRYABLE
# Auth: token may be expired — refresh and retry
if status == 401:
return ErrorClass.AUTH_FAILURE
# Permissions: not a token issue — don't retry
if status == 403:
return ErrorClass.NON_RETRYABLE
# Rate limit: wait for Retry-After then retry
if status == 429:
return ErrorClass.RATE_LIMITED
# Retryable: server-side transient errors
if status in (500, 502, 503, 504, 507, 529):
return ErrorClass.RETRYABLE
# Unknown: assume retryable with low confidence
return ErrorClass.RETRYABLE
def classify_api_error(error_body: dict) -> ErrorClass:
"""
Classify by API error code — more precise than HTTP status.
"""
error = error_body.get("error", {})
code = error.get("code", "") if isinstance(error, dict) else str(error)
message = error.get("message", "") if isinstance(error, dict) else ""
# Non-retryable error codes (fix the request):
NON_RETRYABLE_CODES = {
"invalid_request_error", "validation_error", "invalid_parameter",
"model_not_found", "invalid_model", "context_length_exceeded",
"content_policy_violation", "unsupported_media_type",
"missing_required_parameter", "invalid_api_key"
}
if code in NON_RETRYABLE_CODES:
return ErrorClass.NON_RETRYABLE
# Retryable error codes:
RETRYABLE_CODES = {
"overloaded_error", "api_error", "server_error",
"rate_limit_error", "timeout_error"
}
if code in RETRYABLE_CODES:
return ErrorClass.RETRYABLE if "rate" not in code else ErrorClass.RATE_LIMITED
return ErrorClass.RETRYABLE # Default: assume retryable
Option 2: Smart retry wrapper with error classification
import asyncio
import random
import httpx
from dataclasses import dataclass
@dataclass
class RetryConfig:
max_retries: int = 3
base_delay: float = 1.0
max_delay: float = 60.0
jitter: bool = True
class NonRetryableError(Exception):
"""Raised when the error cannot be resolved by retrying"""
def __init__(self, status: int, error_code: str, message: str):
self.status = status
self.error_code = error_code
super().__init__(
f"Non-retryable error {status} ({error_code}): {message}. "
f"Fix the request before retrying."
)
async def smart_retry(
fn,
config: RetryConfig = RetryConfig(),
token_refresh_fn = None
) -> any:
"""
Retry with error classification.
- Non-retryable errors: raise immediately (don't waste retries)
- Rate limit: wait for Retry-After then retry
- Auth failure: refresh token then retry
- Transient: exponential backoff with jitter
"""
last_error = None
for attempt in range(config.max_retries + 1):
try:
return await fn()
except httpx.HTTPStatusError as e:
status = e.response.status_code
error_class = classify_http_error(e.response)
try:
body = e.response.json()
except Exception:
body = {}
error_code = body.get("error", {}).get("code", str(status)) if isinstance(body.get("error"), dict) else str(status)
message = body.get("error", {}).get("message", e.response.text[:200]) if isinstance(body.get("error"), dict) else e.response.text[:200]
if error_class == ErrorClass.NON_RETRYABLE:
# Don't retry — raise immediately with clear message
raise NonRetryableError(status, error_code, message)
if error_class == ErrorClass.AUTH_FAILURE and token_refresh_fn:
print(f"Auth failure — refreshing token (attempt {attempt + 1})")
await token_refresh_fn()
# Retry with refreshed token (no delay needed)
last_error = e
continue
if error_class == ErrorClass.RATE_LIMITED:
retry_after = float(e.response.headers.get("retry-after", "60"))
print(f"Rate limited — waiting {retry_after}s")
await asyncio.sleep(retry_after)
last_error = e
continue
# Transient — exponential backoff
if attempt < config.max_retries:
delay = min(config.base_delay * (2 ** attempt), config.max_delay)
if config.jitter:
delay *= (0.5 + random.random() * 0.5)
print(f"Transient error {status} — retrying in {delay:.1f}s (attempt {attempt + 1}/{config.max_retries})")
await asyncio.sleep(delay)
last_error = e
else:
raise
raise last_error
# Usage:
try:
result = await smart_retry(
fn=lambda: call_api(bad_payload),
config=RetryConfig(max_retries=3)
)
except NonRetryableError as e:
# Request is wrong — no point retrying
print(f"Request error: {e}")
# Fix the request or escalate to user
Option 3: Per-error-type retry policies
from dataclasses import dataclass
from typing import Callable
@dataclass
class RetryPolicy:
should_retry: bool
max_attempts: int
delay_fn: Callable[[int], float] # attempt number → delay seconds
action_before_retry: Callable | None = None # e.g., token refresh
message: str = ""
RETRY_POLICIES: dict[str, RetryPolicy] = {
# Non-retryable — fail immediately
"400": RetryPolicy(
should_retry=False, max_attempts=0,
delay_fn=lambda _: 0,
message="Bad request — fix parameters before retrying"
),
"404": RetryPolicy(
should_retry=False, max_attempts=0,
delay_fn=lambda _: 0,
message="Resource not found — verify the ID or URL"
),
"422": RetryPolicy(
should_retry=False, max_attempts=0,
delay_fn=lambda _: 0,
message="Validation failed — check request body against API schema"
),
# Auth — refresh and retry once
"401": RetryPolicy(
should_retry=True, max_attempts=2,
delay_fn=lambda _: 0,
action_before_retry=refresh_token,
message="Token expired — refreshing"
),
# Rate limit — wait then retry
"429": RetryPolicy(
should_retry=True, max_attempts=5,
delay_fn=lambda attempt: 60.0, # Wait for quota reset
message="Rate limited — waiting for quota reset"
),
# Transient — exponential backoff
"500": RetryPolicy(
should_retry=True, max_attempts=3,
delay_fn=lambda attempt: min(2 ** attempt, 30),
message="Server error — retrying with backoff"
),
"503": RetryPolicy(
should_retry=True, max_attempts=4,
delay_fn=lambda attempt: min(2 ** attempt * 2, 60),
message="Service unavailable — retrying with backoff"
),
}
async def execute_with_policy(fn, status_code_fn) -> any:
"""Execute fn, applying the correct retry policy based on error code"""
policy = None
attempt = 0
while True:
try:
return await fn()
except Exception as e:
code = str(status_code_fn(e))
policy = RETRY_POLICIES.get(code, RETRY_POLICIES.get("500"))
if not policy.should_retry or attempt >= policy.max_attempts - 1:
print(f"Not retrying ({code}): {policy.message}")
raise
print(f"Retry policy for {code}: {policy.message} (attempt {attempt + 1}/{policy.max_attempts})")
if policy.action_before_retry:
await policy.action_before_retry()
delay = policy.delay_fn(attempt)
if delay > 0:
await asyncio.sleep(delay)
attempt += 1
Option 4: Log non-retryable errors for diagnosis
import logging
import json
logger = logging.getLogger(__name__)
NON_RETRYABLE_STATUSES = {400, 404, 405, 406, 409, 410, 413, 422, 451}
async def logged_api_call(endpoint: str, payload: dict, client: httpx.AsyncClient) -> dict:
"""
API call with structured logging for non-retryable errors.
Makes it easy to identify requests that need fixing (not retrying).
"""
try:
response = await client.post(endpoint, json=payload, timeout=30.0)
response.raise_for_status()
return response.json()
except httpx.HTTPStatusError as e:
status = e.response.status_code
try:
error_body = e.response.json()
except Exception:
error_body = {"raw": e.response.text[:500]}
if status in NON_RETRYABLE_STATUSES:
logger.error(
"Non-retryable API error — fix request, do not retry",
extra={
"event": "non_retryable_error",
"endpoint": endpoint,
"status": status,
"error": error_body,
"request_payload": {k: v for k, v in payload.items()
if k not in ("api_key", "secret")},
"recommendation": "Inspect error.message and fix the request payload"
}
)
raise NonRetryableError(
status=status,
error_code=error_body.get("error", {}).get("code", str(status)),
message=json.dumps(error_body)[:200]
)
else:
logger.warning(
"Transient API error — will retry",
extra={
"event": "retryable_error",
"endpoint": endpoint,
"status": status,
"error": error_body
}
)
raise
Option 5: Agent system prompt — teach error triage
System prompt:
"Error handling rules — read before retrying any failed operation:
NEVER retry these errors — they require fixing the request, not repeating it:
- 400 Bad Request: invalid parameters — inspect error.message, fix the payload
- 404 Not Found: wrong ID or URL — verify the resource exists
- 422 Unprocessable Entity: validation failed — check field types and required fields
- 403 Forbidden: insufficient permissions — cannot be resolved by retrying
- 409 Conflict: resource already exists or state conflict — handle the conflict
DO retry these errors — they are transient:
- 429 Too Many Requests: wait for Retry-After header duration, then retry once
- 500 Internal Server Error: wait 2s, 4s, 8s between attempts (max 3 retries)
- 503 Service Unavailable: wait 5s, 10s, 20s between attempts (max 3 retries)
When you receive a non-retryable error:
1. Read the full error message
2. Identify what is wrong with the request
3. Either fix the request OR report to the user that the operation cannot complete
4. Do NOT retry the same failing request — it will always fail the same way"
Option 6: Retry budget tracker — stop when budget is exhausted
from dataclasses import dataclass, field
import time
@dataclass
class RetryBudget:
"""
Global retry budget across all operations in a task.
Prevents runaway retry loops from consuming all tokens.
"""
max_total_retries: int = 10
max_per_endpoint: int = 3
window_seconds: float = 300.0
_total_retries: int = field(default=0, init=False)
_per_endpoint: dict[str, int] = field(default_factory=dict, init=False)
_window_start: float = field(default_factory=time.monotonic, init=False)
def can_retry(self, endpoint: str, error_class: ErrorClass) -> bool:
"""Check if retry is allowed under current budget"""
# Reset window if expired
if time.monotonic() - self._window_start > self.window_seconds:
self._total_retries = 0
self._per_endpoint.clear()
self._window_start = time.monotonic()
# Non-retryable: never allow
if error_class == ErrorClass.NON_RETRYABLE:
return False
# Budget checks
if self._total_retries >= self.max_total_retries:
print(
f"Retry budget exhausted ({self._total_retries}/{self.max_total_retries} total). "
f"Task may need human intervention."
)
return False
endpoint_retries = self._per_endpoint.get(endpoint, 0)
if endpoint_retries >= self.max_per_endpoint:
print(
f"Per-endpoint retry limit reached for {endpoint} "
f"({endpoint_retries}/{self.max_per_endpoint}). "
f"Endpoint may be persistently failing."
)
return False
return True
def record_retry(self, endpoint: str):
self._total_retries += 1
self._per_endpoint[endpoint] = self._per_endpoint.get(endpoint, 0) + 1
@property
def status(self) -> dict:
return {
"total_retries": self._total_retries,
"max_total": self.max_total_retries,
"remaining": self.max_total_retries - self._total_retries,
"per_endpoint": dict(self._per_endpoint)
}
budget = RetryBudget(max_total_retries=10, max_per_endpoint=3)
Retryable vs Non-Retryable Error Reference
| HTTP Status | Error Class | Action |
|---|---|---|
| 400 Bad Request | Non-retryable | Fix request payload |
| 401 Unauthorized | Auth | Refresh token, retry once |
| 403 Forbidden | Non-retryable | Check permissions — not a token issue |
| 404 Not Found | Non-retryable | Verify resource ID/URL |
| 409 Conflict | Non-retryable | Resolve conflict first |
| 413 Payload Too Large | Non-retryable | Reduce request size |
| 422 Unprocessable | Non-retryable | Fix validation errors |
| 429 Too Many Requests | Rate-limited | Wait Retry-After, then retry |
| 500 Internal Error | Retryable | Exponential backoff, max 3 |
| 502 Bad Gateway | Retryable | Retry after 5s |
| 503 Unavailable | Retryable | Retry after 10s, max 4 |
| 504 Gateway Timeout | Retryable | Retry after 10s |
Expected Token Savings
5 retries × non-retryable error × 1,000 tokens each: 5,000 wasted tokens per failed operation Fail fast on non-retryable: 0 wasted retries, immediate escalation
Environment
- Any agent with retry logic calling external APIs; critical for agents calling LLM APIs, REST services, or databases where request errors must be distinguished from service errors
- Source: direct experience; retrying 400/422 errors is one of the most common sources of wasted token budget in production agents
Wasting tokens on this error?
Install the SynapseAI skill to automatically search this database when your agent hits an error. Average savings: $2–5 per error incident.
clawhub install synapse-ai
Solved an error that's not here?
Share it and earn MoltCoin rewards.