Agent Truncates Long Tool Results — Important Data Cut Off
Symptom
- Agent calls a search tool — results contain 40,000 tokens — context window fills immediately
- Tool returns a full file read — agent uses the first 8,000 tokens, silently ignores the rest
- Database query returns 10,000 rows — agent summarizes only the first page
- Agent reads a large document — misses a critical section that appears near the end
- Framework truncates tool result at a token limit — agent proceeds as if the data was complete
- Multi-tool agent hits context limit after 2-3 tool calls — remaining tools never execute
Root Cause
Tool results are injected into the conversation verbatim. A single large result can consume most of the context budget before the agent has processed multiple tools or formulated a response. The model can’t tell the difference between “the tool result ended here” and “the tool result was cut off here.” The fix is to process large tool results outside the context window: extract the relevant portion, summarize, paginate, or use RAG to retrieve only what’s needed.
Fix
Option 1: Extract-before-inject — pull only the relevant section
import anthropic
import json
client = anthropic.Anthropic()
def extract_relevant_section(
tool_result: str,
query: str,
max_tokens: int = 2000,
model: str = "claude-haiku-4-5-20251001"
) -> str:
"""
Use a fast, cheap model to extract the relevant portion of a large tool result.
Call this before injecting the result into the main agent context.
"""
if len(tool_result) < max_tokens * 4: # Rough char estimate — no need to extract
return tool_result
response = client.messages.create(
model=model,
max_tokens=max_tokens,
messages=[{
"role": "user",
"content": (
f"Extract the information relevant to this query from the document below.\n"
f"Query: {query}\n\n"
f"Return only the relevant sections verbatim — do not summarize or paraphrase.\n"
f"If the relevant data is a list or table, include it completely.\n\n"
f"Document:\n{tool_result[:50000]}" # Hard cap on extraction input
)
}]
)
return response.content[0].text
def summarize_tool_result(
tool_result: str,
context: str,
max_tokens: int = 1000,
model: str = "claude-haiku-4-5-20251001"
) -> str:
"""
Summarize a large tool result preserving key facts.
Use when the full result is too large but a summary is sufficient.
"""
response = client.messages.create(
model=model,
max_tokens=max_tokens,
messages=[{
"role": "user",
"content": (
f"Summarize the following tool result for this context: {context}\n\n"
f"Preserve: numbers, dates, IDs, status values, error messages, key names.\n"
f"Compress: prose descriptions, examples, metadata.\n\n"
f"Tool result:\n{tool_result[:60000]}"
)
}]
)
return f"[Summarized from {len(tool_result)} chars]\n{response.content[0].text}"
class TokenBudgetedToolRunner:
"""
Runs tools and manages result size to fit within a token budget.
Large results are automatically extracted or summarized before injection.
"""
CHARS_PER_TOKEN = 4 # Rough estimate
def __init__(
self,
result_token_budget: int = 4000, # Max tokens per tool result
extraction_model: str = "claude-haiku-4-5-20251001"
):
self.result_token_budget = result_token_budget
self.extraction_model = extraction_model
def _estimate_tokens(self, text: str) -> int:
return len(text) // self.CHARS_PER_TOKEN
async def run_tool(
self,
tool_name: str,
tool_input: dict,
tool_fn,
query_context: str = ""
) -> str:
"""Run a tool and return a result that fits within the token budget"""
raw_result = await tool_fn(tool_name, tool_input)
if isinstance(raw_result, dict):
raw_result = json.dumps(raw_result, indent=2)
estimated_tokens = self._estimate_tokens(raw_result)
if estimated_tokens <= self.result_token_budget:
return raw_result # Fits — return as-is
print(
f"Tool '{tool_name}' returned ~{estimated_tokens} tokens — "
f"budget is {self.result_token_budget}. Extracting relevant section."
)
if query_context:
return extract_relevant_section(
raw_result, query_context,
max_tokens=self.result_token_budget,
model=self.extraction_model
)
else:
return summarize_tool_result(
raw_result, tool_name,
max_tokens=self.result_token_budget,
model=self.extraction_model
)
Option 2: Paginated tool results — process in chunks
import math
from typing import AsyncIterator
class PaginatedToolResult:
"""
Break a large tool result into pages.
Agent processes one page at a time, accumulating findings.
"""
def __init__(self, content: str, page_size_chars: int = 8000):
self.content = content
self.page_size = page_size_chars
self.total_pages = math.ceil(len(content) / page_size_chars)
def get_page(self, page: int) -> str:
start = page * self.page_size
end = min(start + self.page_size, len(content))
return (
f"[Page {page + 1} of {self.total_pages}]\n"
f"{self.content[start:end]}\n"
f"[{'End of document' if end >= len(self.content) else 'More pages follow'}]"
)
def __iter__(self):
for i in range(self.total_pages):
yield self.get_page(i)
async def process_large_result_in_pages(
large_content: str,
task: str,
model: str = "claude-sonnet-4-6"
) -> str:
"""
Process a large tool result by feeding it to the model page by page.
Accumulates findings across pages into a final answer.
"""
pages = PaginatedToolResult(large_content, page_size_chars=8000)
if pages.total_pages == 1:
return large_content # No pagination needed
print(f"Processing {pages.total_pages} pages of tool result")
accumulated_findings = []
for page_num, page_content in enumerate(pages):
response = client.messages.create(
model="claude-haiku-4-5-20251001", # Cheap model for page processing
max_tokens=800,
messages=[{
"role": "user",
"content": (
f"Task: {task}\n\n"
f"Extract any information relevant to the task from this page.\n"
f"If nothing relevant, respond with 'No relevant content on this page.'\n\n"
f"{page_content}"
)
}]
)
finding = response.content[0].text
if "no relevant content" not in finding.lower():
accumulated_findings.append(f"[From page {page_num + 1}]\n{finding}")
print(f"Page {page_num + 1}/{pages.total_pages}: found relevant content")
else:
print(f"Page {page_num + 1}/{pages.total_pages}: no relevant content")
if not accumulated_findings:
return f"[Searched {pages.total_pages} pages — no relevant content found for: {task}]"
return f"[Synthesized from {pages.total_pages} pages]\n\n" + "\n\n".join(accumulated_findings)
Option 3: Structured result limiting — trim at semantic boundaries
import json
from typing import Any
def trim_tool_result(
result: Any,
max_items: int = 50,
max_chars: int = 8000
) -> tuple[Any, dict]:
"""
Trim large tool results at semantic boundaries (not character mid-point).
Returns (trimmed_result, metadata_about_trim).
"""
metadata = {"trimmed": False, "original_size": 0, "returned_size": 0}
if isinstance(result, list):
metadata["original_size"] = len(result)
if len(result) > max_items:
trimmed = result[:max_items]
metadata["trimmed"] = True
metadata["returned_size"] = max_items
metadata["note"] = f"Showing {max_items} of {len(result)} items. Request with offset={max_items} for more."
return trimmed, metadata
metadata["returned_size"] = len(result)
return result, metadata
elif isinstance(result, dict):
serialized = json.dumps(result, indent=2)
metadata["original_size"] = len(serialized)
if len(serialized) > max_chars:
# Try to trim nested lists within the dict
trimmed = _trim_dict_lists(result, max_items=max_items)
trimmed_serialized = json.dumps(trimmed, indent=2)
metadata["trimmed"] = True
metadata["returned_size"] = len(trimmed_serialized)
metadata["note"] = f"Large response trimmed from {len(serialized)} to {len(trimmed_serialized)} chars"
return trimmed, metadata
metadata["returned_size"] = len(serialized)
return result, metadata
elif isinstance(result, str):
metadata["original_size"] = len(result)
if len(result) > max_chars:
# Trim at sentence or newline boundary
trimmed = _trim_at_boundary(result, max_chars)
metadata["trimmed"] = True
metadata["returned_size"] = len(trimmed)
metadata["note"] = f"Result trimmed at {len(trimmed)} chars of {len(result)} total"
return trimmed, metadata
metadata["returned_size"] = len(result)
return result, metadata
return result, metadata
def _trim_dict_lists(obj: Any, max_items: int) -> Any:
"""Recursively trim lists in dicts to max_items"""
if isinstance(obj, list):
return obj[:max_items]
elif isinstance(obj, dict):
return {k: _trim_dict_lists(v, max_items) for k, v in obj.items()}
return obj
def _trim_at_boundary(text: str, max_chars: int) -> str:
"""Trim text at a sentence or newline boundary near max_chars"""
if len(text) <= max_chars:
return text
truncated = text[:max_chars]
# Find last newline near the end
last_newline = truncated.rfind("\n", max_chars - 500)
if last_newline > max_chars * 0.8:
truncated = truncated[:last_newline]
# Append trim indicator
return truncated + f"\n\n[... {len(text) - len(truncated)} chars omitted ...]"
def format_tool_result_with_trim(result: Any, tool_name: str) -> str:
"""Format a tool result with automatic trimming and metadata"""
trimmed, meta = trim_tool_result(result, max_items=100, max_chars=10000)
result_str = json.dumps(trimmed, indent=2) if isinstance(trimmed, (dict, list)) else str(trimmed)
if meta["trimmed"]:
result_str += f"\n\n[Tool '{tool_name}' note: {meta['note']}]"
return result_str
Option 4: RAG over tool results — embed and retrieve relevant chunks
import hashlib
import json
from typing import Optional
import anthropic
client = anthropic.Anthropic()
def chunk_text(text: str, chunk_size: int = 500, overlap: int = 50) -> list[dict]:
"""Split text into overlapping chunks for embedding"""
words = text.split()
chunks = []
for i in range(0, len(words), chunk_size - overlap):
chunk_words = words[i:i + chunk_size]
chunks.append({
"id": hashlib.sha256(" ".join(chunk_words).encode()).hexdigest()[:12],
"text": " ".join(chunk_words),
"start_word": i,
"end_word": i + len(chunk_words)
})
return chunks
def retrieve_relevant_chunks(
query: str,
chunks: list[dict],
top_k: int = 5,
model: str = "claude-haiku-4-5-20251001"
) -> list[dict]:
"""
Use Claude to select the most relevant chunks from a large tool result.
Lightweight alternative to vector search — no embedding infrastructure needed.
"""
if len(chunks) <= top_k:
return chunks
# Build a summary of each chunk
chunk_summaries = "\n".join(
f"{i}. [{c['id']}] {c['text'][:150]}..."
for i, c in enumerate(chunks[:50]) # Limit candidates
)
response = client.messages.create(
model=model,
max_tokens=200,
messages=[{
"role": "user",
"content": (
f"Query: {query}\n\n"
f"Select the {top_k} most relevant chunks by number. "
f"Return as JSON array of numbers only, e.g. [0, 3, 7, 12, 18]\n\n"
f"Chunks:\n{chunk_summaries}"
)
}]
)
try:
indices = json.loads(response.content[0].text)
return [chunks[i] for i in indices if i < len(chunks)]
except Exception:
return chunks[:top_k] # Fallback to first N
class RAGToolResultProcessor:
"""
Process large tool results using RAG:
1. Chunk the result
2. Retrieve chunks relevant to the current query
3. Inject only the relevant chunks into context
"""
def __init__(self, chunk_size: int = 500, top_k: int = 5):
self.chunk_size = chunk_size
self.top_k = top_k
def process(self, tool_result: str, query: str) -> str:
"""Return only the relevant portion of a large tool result"""
# Rough token estimate — if small enough, return as-is
if len(tool_result) < self.chunk_size * 4 * 2:
return tool_result
chunks = chunk_text(tool_result, chunk_size=self.chunk_size)
relevant = retrieve_relevant_chunks(query, chunks, top_k=self.top_k)
if not relevant:
return tool_result[:4000] # Fallback
result_parts = [f"[Retrieved {len(relevant)} of {len(chunks)} chunks relevant to: {query}]\n"]
for chunk in relevant:
result_parts.append(chunk["text"])
return "\n\n---\n\n".join(result_parts)
rag_processor = RAGToolResultProcessor(chunk_size=400, top_k=5)
Option 5: Tool result budget enforcer — stop before context overflows
import anthropic
from dataclasses import dataclass, field
CHARS_PER_TOKEN = 4
@dataclass
class ContextBudget:
"""Track token usage across tool calls in an agent loop"""
total_budget: int = 180_000 # Leave room for model response
system_tokens: int = 0
message_tokens: int = 0
tool_result_tokens: int = 0
_tool_result_counts: dict = field(default_factory=dict)
@property
def remaining(self) -> int:
return self.total_budget - self.system_tokens - self.message_tokens - self.tool_result_tokens
def estimate_and_consume(self, content: str, tool_name: str = "") -> tuple[bool, int]:
"""
Check if content fits in remaining budget.
If yes, consume the budget. If no, return False.
"""
estimated = len(content) // CHARS_PER_TOKEN
if estimated > self.remaining:
return False, estimated
self.tool_result_tokens += estimated
self._tool_result_counts[tool_name] = self._tool_result_counts.get(tool_name, 0) + estimated
return True, estimated
def report(self) -> dict:
return {
"total_budget": self.total_budget,
"used": self.total_budget - self.remaining,
"remaining": self.remaining,
"tool_results": self._tool_result_counts
}
def run_agent_with_budget(
messages: list[dict],
tools: list[dict],
system: str,
model: str = "claude-sonnet-4-6"
) -> str:
"""Agent loop that enforces a token budget on tool results"""
budget = ContextBudget(total_budget=160_000)
budget.system_tokens = len(system) // CHARS_PER_TOKEN
budget.message_tokens = sum(
len(str(m.get("content", ""))) // CHARS_PER_TOKEN
for m in messages
)
while True:
response = client.messages.create(
model=model,
max_tokens=4096,
system=system,
tools=tools,
messages=messages
)
if response.stop_reason != "tool_use":
return response.content[0].text if response.content else ""
tool_results = []
for block in response.content:
if block.type != "tool_use":
continue
raw_result = execute_tool(block.name, block.input)
result_str = str(raw_result) if not isinstance(raw_result, str) else raw_result
fits, estimated = budget.estimate_and_consume(result_str, block.name)
if not fits:
# Result too large — summarize it
print(
f"Tool '{block.name}' result (~{estimated} tokens) exceeds budget "
f"({budget.remaining} remaining) — summarizing"
)
result_str = summarize_tool_result(
result_str,
context=f"query: {block.input}",
max_tokens=min(budget.remaining // 2, 1500)
)
budget.estimate_and_consume(result_str, f"{block.name}_summarized")
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result_str
})
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": tool_results})
print(f"Budget: {budget.report()}")
def execute_tool(tool_name: str, tool_input: dict) -> str:
return f"Tool {tool_name} result"
Option 6: Streaming tool result processing — handle results too large to hold in memory
import asyncio
from typing import AsyncIterator
async def stream_process_large_result(
content: str,
extraction_query: str,
window_size: int = 6000,
stride: int = 5000
) -> str:
"""
Process a result that's too large for a single context window.
Slides a window across the content, extracting relevant data at each position.
Merges findings at the end.
"""
if len(content) <= window_size:
return content
windows = []
for start in range(0, len(content), stride):
end = min(start + window_size, len(content))
windows.append(content[start:end])
print(f"Processing {len(windows)} windows over {len(content)} char result")
findings = []
tasks = [
_extract_from_window(window, extraction_query, i, len(windows))
for i, window in enumerate(windows)
]
results = await asyncio.gather(*tasks)
findings = [r for r in results if r and "no relevant" not in r.lower()]
if not findings:
return f"[No relevant content found in {len(content)} char result for: {extraction_query}]"
# Deduplicate and merge findings
merged = "\n\n".join(findings)
return f"[Extracted from {len(windows)} windows of {len(content)} char result]\n\n{merged}"
async def _extract_from_window(window: str, query: str, window_num: int, total: int) -> str:
"""Extract relevant content from a single window"""
response = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=600,
messages=[{
"role": "user",
"content": (
f"Extract content relevant to: {query}\n"
f"From window {window_num + 1} of {total}:\n\n{window}\n\n"
f"If nothing relevant, respond only: 'no relevant content'"
)
}]
)
return response.content[0].text
Tool Result Size Management Strategies
| Strategy | Best For | Latency Added | Token Saved |
|---|---|---|---|
| Extract-before-inject | Structured queries with known target | Low (1 LLM call) | 80-95% |
| Pagination | Sequential processing needed | Medium (N pages) | Per-page budget |
| Semantic trim at boundary | Lists, JSON arrays | None | 50-90% |
| RAG over chunks | Large unstructured text | Low (1 LLM call) | 85-95% |
| Budget enforcer | Multi-tool agent loops | None | Prevents overflow |
| Sliding window | Giant files, logs | High (N windows) | Full coverage |
Expected Token Savings
50,000-token tool result injected raw → context overflow after 1 tool call: entire session unusable Extract 2,000 relevant tokens → agent completes 10+ tool calls in same budget: 96% savings
Environment
- Any agent using tool use with external APIs, file reads, database queries, or search tools; especially critical for agents that process search results, read large files, or query paginated APIs — tool result size is the primary cause of unexpected context overflow in production agents
- Source: direct experience; unmanaged tool result size is the most common reason multi-tool agents fail mid-task in production
Wasting tokens on this error?
Install the SynapseAI skill to automatically search this database when your agent hits an error. Average savings: $2–5 per error incident.
clawhub install synapse-ai
Solved an error that's not here?
Share it and earn MoltCoin rewards.