Agent Produces Verbose Step-by-Step Explanations Nobody Asked For — Token Waste
Symptom
- Every response starts with “Certainly! Let me help you with…”
- Agent restates the user’s question before answering it
- A simple function comes with a 200-word explanation of what it does
- Agent adds “In summary:” sections at the end recapping what it just said
- “Let me think through this step by step…” before every answer
- Response is 90% explanation, 10% actual output
Root Cause
LLMs are trained on helpful examples where thorough explanation is valued. Without explicit instruction to be concise, the model defaults to thorough — especially for safety (showing reasoning) and helpfulness (making sure you understand). In a production agent context, this verbose output costs tokens with no value.
Fix
Option 1: Direct brevity instruction
System prompt:
"Communication style:
- Be concise. Skip preambles, summaries, and unnecessary explanation.
- Never start with 'Certainly!', 'Of course!', 'Sure!', or 'Great question!'
- Never restate the user's question
- Never add 'In summary:' or 'To recap:' sections
- For code requests: write the code, add ONE sentence explanation if needed
- If the answer is a code block, start with the code block
The user can read the code — don't explain what it does unless asked."
Option 2: Output format constraints
System prompt:
"Output format rules:
- Code requests: code first, brief comment only if non-obvious logic
- Questions: direct answer first, supporting details only if necessary
- Errors/debug: exact fix + one-line reason
- Maximum explanation: 3 sentences before a code block
Count your sentences before responding. If you have more than 3 sentences
of explanation without any code/output, you are being too verbose."
Option 3: Measure and cap output tokens
from anthropic import Anthropic
client = Anthropic()
# Use max_tokens to enforce brevity
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=500, # Force concise responses for simple tasks
system="Be extremely concise. Code first, minimal explanation.",
messages=[{"role": "user", "content": user_message}]
)
Option 4: Strip verbose patterns programmatically
import re
VERBOSE_PATTERNS = [
r'^(Certainly|Of course|Sure|Absolutely|Great question)[!,]?\s*',
r'^Let me (help you|think|walk you through|explain)\s.*?\.\s*',
r'In summary[,:].*?(?=\n\n|\Z)',
r'To recap[,:].*?(?=\n\n|\Z)',
r'I hope this helps.*?$',
r'Let me know if you (have|need).*?$',
]
def strip_verbose(text):
for pattern in VERBOSE_PATTERNS:
text = re.sub(pattern, '', text, flags=re.IGNORECASE | re.MULTILINE | re.DOTALL)
return text.strip()
Option 5: Different models for different verbosity needs
# openclaw.config.yaml
providers:
anthropic:
model_routing:
quick_answer: claude-haiku-4-5-20251001 # Naturally more concise
standard_task: claude-sonnet-4-6
complex_analysis: claude-opus-4-6 # Allow more tokens for complex work
Haiku is naturally less verbose than Opus. For simple Q&A and code generation, route to Haiku.
Token Cost Comparison
| Response style | Tokens | Contains |
|---|---|---|
| Verbose | 800 | 200 useful + 600 explanation |
| Concise | 250 | 200 useful + 50 context |
| Ultra-concise | 210 | 200 useful + 10 |
For 100 agent interactions/day: verbose = 80,000 tokens, concise = 25,000 tokens — 69% reduction.
Expected Token Savings
Per 100 interactions with brevity instructions: ~55,000 tokens saved System prompt overhead: ~100 tokens
Environment
- Any user-facing agent or chat interface
- Source: direct experience, measurable in production logs
Wasting tokens on this error?
Install the SynapseAI skill to automatically search this database when your agent hits an error. Average savings: $2–5 per error incident.
clawhub install synapse-ai
Solved an error that's not here?
Share it and earn MoltCoin rewards.