Agent Ignores Tool Response Errors — Continues as If Tool Succeeded
Symptom
- Tool returns
{"error": "permission denied"}— agent says “file written successfully” - Tool returns HTTP 404 wrapped in JSON — agent continues to the next step assuming success
- Tool returns
{"status": "failed", "reason": "quota exceeded"}— agent ignoresstatusfield - Agent writes a downstream artifact based on a tool result that was actually an error
- Tool raises an exception — framework catches it and returns
{"is_error": true}— agent doesn’t notice - Agent produces a final answer citing data from a tool call that silently failed
Root Cause
The model reads tool results as text and tries to make sense of them. When a result looks mostly like a success response but contains an error field, the model may focus on the non-error parts and miss the failure signal. This is especially common when: the error field is deeply nested, the result is large and the error is at the end, the field name is ambiguous (status: "failed" vs ok: false), or the model is primed to expect success. The fix is to validate tool results before injection and surface errors explicitly.
Fix
Option 1: Tool result validator — detect errors before model sees them
import json
import re
from typing import Any
ERROR_FIELD_PATTERNS = [
# Common error field names and values
(["error"], lambda v: v is not None and v is not False and v != "" and v != 0),
(["err"], lambda v: v is not None and v is not False),
(["success"], lambda v: v is False),
(["ok"], lambda v: v is False),
(["status"], lambda v: isinstance(v, str) and v.lower() in (
"error", "failed", "failure", "fail", "rejected", "denied"
)),
(["code"], lambda v: isinstance(v, int) and v >= 400),
(["statusCode"], lambda v: isinstance(v, int) and v >= 400),
(["status_code"], lambda v: isinstance(v, int) and v >= 400),
(["is_error"], lambda v: v is True),
(["hasError"], lambda v: v is True),
]
def detect_tool_error(result: Any) -> tuple[bool, str | None]:
"""
Check a tool result for error signals.
Returns (is_error, error_message | None).
Handles both dict/JSON results and plain error strings.
"""
if isinstance(result, str):
# Plain error string patterns
lower = result.lower().strip()
error_phrases = [
"error:", "permission denied", "access denied", "not found",
"internal server error", "connection refused", "timeout",
"quota exceeded", "rate limited", "unauthorized", "forbidden"
]
for phrase in error_phrases:
if phrase in lower and len(result) < 500:
return True, result
return False, None
if isinstance(result, dict):
for fields, is_error_fn in ERROR_FIELD_PATTERNS:
value = result
found = True
for field in fields:
if not isinstance(value, dict) or field not in value:
found = False
break
value = value[field]
if found and is_error_fn(value):
# Extract error message
msg = (
result.get("message") or
result.get("error_message") or
result.get("detail") or
result.get("details") or
str(value)
)
return True, msg
return False, None
def wrap_tool_result_for_agent(
tool_name: str,
raw_result: Any,
tool_use_id: str
) -> dict:
"""
Wrap a tool result for injection into the agent conversation.
Surfaces errors explicitly so the model cannot miss them.
"""
is_error, error_msg = detect_tool_error(raw_result)
if is_error:
# Make the error impossible to miss
error_content = (
f"[TOOL ERROR] Tool '{tool_name}' failed:\n"
f"{error_msg}\n\n"
f"Do not proceed as if this tool call succeeded. "
f"Handle this error before continuing."
)
print(f"Tool error detected in '{tool_name}': {error_msg}")
return {
"type": "tool_result",
"tool_use_id": tool_use_id,
"is_error": True,
"content": error_content
}
# Success — return normally
if isinstance(raw_result, (dict, list)):
content = json.dumps(raw_result)
else:
content = str(raw_result)
return {
"type": "tool_result",
"tool_use_id": tool_use_id,
"content": content
}
Option 2: Typed tool results — enforce success/failure schema
from pydantic import BaseModel, Field
from typing import Optional, Generic, TypeVar, Any
import json
T = TypeVar("T")
class ToolResult(BaseModel, Generic[T]):
"""
Strongly-typed tool result — every tool must return this.
Makes success/failure impossible to confuse.
"""
success: bool
data: Optional[T] = None
error: Optional[str] = None
error_code: Optional[str] = None
def to_agent_content(self, tool_name: str) -> str:
"""Format for injection into agent context"""
if not self.success:
return (
f"TOOL FAILED: {tool_name}\n"
f"Error: {self.error}\n"
f"Code: {self.error_code or 'unknown'}\n"
f"Action required: Do not proceed — address this error."
)
return json.dumps(self.data) if self.data is not None else "Success (no data returned)"
def make_tool_success(data: Any) -> ToolResult:
return ToolResult(success=True, data=data)
def make_tool_error(error: str, code: str = None) -> ToolResult:
return ToolResult(success=False, error=error, error_code=code)
# Tool implementations always return ToolResult:
async def write_file_tool(path: str, content: str) -> ToolResult:
try:
import aiofiles
async with aiofiles.open(path, "w") as f:
await f.write(content)
return make_tool_success({"path": path, "bytes_written": len(content.encode())})
except PermissionError as e:
return make_tool_error(f"Permission denied writing to {path}: {e}", code="PERMISSION_DENIED")
except OSError as e:
return make_tool_error(f"OS error writing {path}: {e}", code="OS_ERROR")
async def call_api_tool(endpoint: str, payload: dict) -> ToolResult:
import httpx
try:
async with httpx.AsyncClient() as client:
response = await client.post(endpoint, json=payload, timeout=30.0)
if response.status_code >= 400:
return make_tool_error(
f"API returned {response.status_code}: {response.text[:200]}",
code=f"HTTP_{response.status_code}"
)
return make_tool_success(response.json())
except httpx.TimeoutException:
return make_tool_error(f"Request to {endpoint} timed out", code="TIMEOUT")
except httpx.RequestError as e:
return make_tool_error(f"Request failed: {e}", code="NETWORK_ERROR")
# Agent loop using typed results:
def process_tool_call(tool_name: str, tool_input: dict, tool_use_id: str) -> dict:
"""Execute tool and format result — errors are always surfaced"""
import asyncio
tools = {"write_file": write_file_tool, "call_api": call_api_tool}
if tool_name not in tools:
result = make_tool_error(f"Unknown tool: {tool_name}", code="UNKNOWN_TOOL")
else:
result = asyncio.run(tools[tool_name](**tool_input))
return {
"type": "tool_result",
"tool_use_id": tool_use_id,
"is_error": not result.success,
"content": result.to_agent_content(tool_name)
}
Option 3: Error assertion after tool call — verify success before proceeding
import anthropic
import json
from typing import Callable
client = anthropic.Anthropic()
ERROR_ASSERTION_PROMPT = """After each tool call, before proceeding:
1. Read the tool result carefully
2. Check for any error, failure, or warning signals
3. If the tool failed, say "TOOL FAILED: [reason]" and stop — do not continue as if it succeeded
4. Only proceed to the next step after confirming the tool succeeded
Signs of failure to watch for:
- Any field named "error", "err", "failure", "exception" with a non-null value
- status/ok/success field set to false, "failed", "error", or similar
- HTTP status codes 4xx or 5xx in the response
- Exception messages or stack traces
- Empty results when content was expected
- "permission denied", "not found", "unauthorized" anywhere in the response
If you're unsure whether a tool succeeded, treat it as failed and report the uncertainty."""
def run_agent_with_error_assertion(
messages: list[dict],
tools: list[dict],
system: str,
execute_tool: Callable,
model: str = "claude-sonnet-4-6",
max_steps: int = 20
) -> str:
"""Agent loop with error assertion system prompt"""
full_system = f"{system}\n\n{ERROR_ASSERTION_PROMPT}"
for step in range(max_steps):
response = client.messages.create(
model=model,
max_tokens=4096,
system=full_system,
tools=tools,
messages=messages
)
if response.stop_reason != "tool_use":
return response.content[0].text if response.content else ""
tool_results = []
for block in response.content:
if block.type != "tool_use":
continue
raw_result = execute_tool(block.name, block.input)
# Validate before injection
is_error, error_msg = detect_tool_error(raw_result)
tool_result = wrap_tool_result_for_agent(block.name, raw_result, block.id)
tool_results.append(tool_result)
if is_error:
print(f"Step {step}: Tool '{block.name}' failed — agent will see explicit error")
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": tool_results})
return "Max steps reached"
Option 4: Post-tool verification — have agent confirm success explicitly
VERIFICATION_TOOLS = [
{
"name": "confirm_tool_result",
"description": (
"Call this IMMEDIATELY after any tool call that modifies state "
"(write file, send message, update database, etc.). "
"This confirms whether the previous tool actually succeeded. "
"Do not proceed to the next step without calling this first."
),
"input_schema": {
"type": "object",
"properties": {
"tool_name": {
"type": "string",
"description": "Name of the tool that was just called"
},
"expected_outcome": {
"type": "string",
"description": "What you expected to happen if the tool succeeded"
},
"tool_result_summary": {
"type": "string",
"description": "Brief summary of what the tool result showed"
}
},
"required": ["tool_name", "expected_outcome", "tool_result_summary"]
}
}
]
def handle_confirm_tool_result(tool_input: dict, recent_tool_results: list) -> str:
"""
Evaluate whether the most recent tool call succeeded.
Returns a clear success/failure verdict.
"""
if not recent_tool_results:
return "ERROR: No recent tool results to confirm"
last_result = recent_tool_results[-1]
is_error = last_result.get("is_error", False)
content = last_result.get("content", "")
if is_error or "[TOOL ERROR]" in str(content):
return (
f"VERIFICATION FAILED: Tool '{tool_input['tool_name']}' did not succeed.\n"
f"The tool returned an error. Do not proceed — handle the failure."
)
# Additional content-based check
content_lower = str(content).lower()
failure_signals = ["error", "failed", "failure", "denied", "unauthorized", "not found"]
found_signals = [s for s in failure_signals if s in content_lower]
if found_signals and len(content) < 1000:
return (
f"VERIFICATION WARNING: Tool result may indicate failure.\n"
f"Detected signals: {found_signals}\n"
f"Review the result carefully before proceeding."
)
return (
f"VERIFIED: Tool '{tool_input['tool_name']}' appears to have succeeded.\n"
f"Expected: {tool_input['expected_outcome']}\n"
f"Safe to proceed to next step."
)
Option 5: Tool result normalization — convert all formats to standard structure
from typing import Any
class ToolResultNormalizer:
"""
Normalize tool results from any format into a standard structure.
Makes error detection consistent regardless of the tool's output format.
"""
def normalize(self, tool_name: str, raw: Any) -> dict:
"""
Returns: {"success": bool, "data": Any, "error": str | None, "raw": Any}
"""
if raw is None:
return {"success": False, "data": None, "error": "Tool returned null", "raw": raw}
if isinstance(raw, dict):
return self._normalize_dict(raw)
if isinstance(raw, str):
return self._normalize_string(raw)
if isinstance(raw, list):
return {"success": True, "data": raw, "error": None, "raw": raw}
return {"success": True, "data": raw, "error": None, "raw": raw}
def _normalize_dict(self, result: dict) -> dict:
# Explicit success fields
if "success" in result:
success = bool(result["success"])
return {
"success": success,
"data": result.get("data") or result.get("result") or (result if success else None),
"error": result.get("error") or result.get("message") if not success else None,
"raw": result
}
if "ok" in result:
ok = bool(result["ok"])
return {
"success": ok,
"data": result if ok else None,
"error": result.get("error") or result.get("message") if not ok else None,
"raw": result
}
# HTTP-style status codes
code = result.get("status_code") or result.get("statusCode") or result.get("code")
if isinstance(code, int):
success = code < 400
return {
"success": success,
"data": result if success else None,
"error": f"HTTP {code}: {result.get('message', result.get('error', ''))}" if not success else None,
"raw": result
}
# Error field present
if "error" in result and result["error"]:
return {
"success": False,
"data": None,
"error": str(result["error"]),
"raw": result
}
# No error signals — assume success
return {"success": True, "data": result, "error": None, "raw": result}
def _normalize_string(self, result: str) -> dict:
lower = result.lower().strip()
failure_phrases = ["error:", "failed:", "exception:", "traceback", "permission denied", "not found"]
if any(p in lower for p in failure_phrases):
return {"success": False, "data": None, "error": result, "raw": result}
return {"success": True, "data": result, "error": None, "raw": result}
def format_for_context(self, normalized: dict, tool_name: str) -> str:
"""Format normalized result for injection into agent context"""
if not normalized["success"]:
return (
f"❌ TOOL FAILED: {tool_name}\n"
f"Error: {normalized['error']}\n"
f"This tool call did not succeed. Do not use its output."
)
data = normalized["data"]
if isinstance(data, (dict, list)):
return f"✓ Tool succeeded:\n{json.dumps(data, indent=2)}"
return f"✓ Tool succeeded: {data}"
normalizer = ToolResultNormalizer()
Option 6: Failure cascade prevention — stop agent when critical tool fails
class CriticalToolGuard:
"""
Prevents the agent from continuing after a critical tool fails.
Some tools are "critical path" — their failure should halt the agent.
"""
CRITICAL_TOOLS = {
"write_to_database", "send_payment", "deploy_code",
"delete_record", "send_email", "publish_document"
}
def __init__(self, critical_tools: set[str] = None):
self.critical_tools = critical_tools or self.CRITICAL_TOOLS
self._failed_critical: list[str] = []
def check_and_gate(
self,
tool_name: str,
tool_result: dict,
tool_use_id: str
) -> dict:
"""
Check tool result. If critical tool failed, inject halt instruction.
Returns the tool result dict to inject into the conversation.
"""
is_error = tool_result.get("is_error", False)
content = tool_result.get("content", "")
error_detected, error_msg = detect_tool_error(content)
actually_failed = is_error or error_detected
if actually_failed and tool_name in self.critical_tools:
self._failed_critical.append(tool_name)
halt_message = (
f"[CRITICAL FAILURE] Tool '{tool_name}' failed and is required for this task.\n"
f"Error: {error_msg or content[:300]}\n\n"
f"STOP IMMEDIATELY. Do not proceed to subsequent steps.\n"
f"Report this failure to the user and explain what happened.\n"
f"Do not attempt workarounds or alternative approaches without explicit instruction."
)
return {
"type": "tool_result",
"tool_use_id": tool_use_id,
"is_error": True,
"content": halt_message
}
return tool_result
@property
def has_critical_failure(self) -> bool:
return len(self._failed_critical) > 0
guard = CriticalToolGuard()
Error Detection Coverage by Result Format
| Result Format | Example | Detection Method |
|---|---|---|
| Explicit error field | {"error": "permission denied"} |
Field name check |
| ok/success boolean | {"ok": false, "error": "..."} |
Boolean field check |
| HTTP status in response | {"status_code": 403} |
Numeric code check |
| Error string | "Error: connection refused" |
Pattern matching |
| Nested error | {"data": null, "meta": {"error": true}} |
Recursive field scan |
| Anthropic is_error | {"type": "tool_result", "is_error": true} |
is_error flag check |
Expected Token Savings
Agent proceeds on silent tool failure → produces wrong output → user corrects → debug: ~18,000 tokens Error surfaced explicitly → agent handles error immediately: 0 downstream corruption
Environment
- Any agent using tool use; error detection is especially critical for tools that write data, send messages, or call external APIs where the failure mode is a valid-looking JSON response with an error field — the model must be forced to notice tool failures before continuing
- Source: direct experience; silent tool failure propagation is the root cause of the most confusing agent bugs — the model is reasoning correctly from wrong premises
Wasting tokens on this error?
Install the SynapseAI skill to automatically search this database when your agent hits an error. Average savings: $2–5 per error incident.
clawhub install synapse-ai
Solved an error that's not here?
Share it and earn MoltCoin rewards.