AI Agent Hallucination Prevention Guide
Hallucinations in production AI agents aren’t just wrong answers — they’re wrong answers that get acted on. An agent that fabricates an API endpoint, invents a library function, or confidently misremembers a previous decision can corrupt data, break systems, or produce outputs that cost more to fix than to redo from scratch.
Types of Hallucination in Agent Systems
| Type | Example | Risk Level |
|---|---|---|
| Factual fabrication | Invents a function that doesn’t exist | High |
| Decision confabulation | “You asked me to do X” when X was never asked | High |
| Specification drift | Subtly changes task requirements mid-execution | Medium |
| Confident uncertainty | States unknown things with certainty | Medium |
| Source invention | Cites non-existent documentation or issues | Low |
Fix 1: Temperature Calibration by Task Type
Temperature is the single biggest lever for hallucination control.
# openclaw.config.yaml
providers:
anthropic:
temperature_by_task:
factual_lookup: 0.0 # Deterministic for facts
code_generation: 0.1 # Low for code
analysis: 0.3 # Some creativity for analysis
creative_writing: 0.8 # High for creative tasks
For agent tasks that involve facts, code, or precise instructions: temperature ≤ 0.2.
Fix 2: Verification Pipeline
The most reliable hallucination prevention — verify outputs before acting on them:
async def verified_agent_action(task):
# Step 1: Generate
result = await agent.execute(task)
# Step 2: Verify code (if output contains code)
if contains_code(result):
test_result = await sandbox.run(result.code)
if test_result.error:
# Re-generate with error context
result = await agent.execute(
task,
context=f"Previous attempt failed: {test_result.error}"
)
# Step 3: Verify facts (if output makes factual claims)
if contains_factual_claims(result):
for claim in extract_claims(result):
if not verify_against_sources(claim):
result.flag_uncertain(claim)
return result
Fix 3: Grounding — Force Citation
Agents hallucinate less when required to cite sources for every claim:
System prompt addition:
"For every factual claim you make, cite the specific file, line number,
or documentation source. If you cannot cite a source, say
'I believe X but cannot verify — please confirm before acting.'"
This forces the agent to distinguish between what it knows from context vs. what it’s generating.
Fix 4: Explicit Uncertainty Expression
Train the agent to express uncertainty rather than guess confidently:
System prompt:
"Use these phrases when uncertain:
- 'I believe [X] — please verify before acting'
- 'I'm not certain about [X] — I recommend checking [source]'
- 'I cannot confirm [X] with the context available'
Never state something as fact unless it appears in the provided context,
codebase, or tool output."
Fix 5: Decision Provenance Logging
Hallucinations in multi-step tasks often compound — an early fabrication becomes a “fact” in later steps. Prevent this with provenance:
agent:
decision_logging:
enabled: true
log_path: ./decisions.md
format: |
## Decision: {description}
**Source:** {source} # file:line, tool output, user instruction, or "model inference"
**Confidence:** {high|medium|low}
**Verified:** {yes|no|pending}
When source is “model inference” and confidence is low, flag for human review before acting.
Fix 6: Hallucination Detection Patterns
Common patterns that indicate a hallucination:
HALLUCINATION_SIGNALS = [
# Function/method doesn't exist in codebase
r'\.(\w+)\(\)', # Call a method → verify it exists
# Specific numbers stated without source
r'\b\d{4,}\b', # Large specific numbers → ask for source
# "I remember" claims
r'(you said|you asked|earlier you|previously you)',
# Invented citations
r'(according to|as stated in|per the docs)',
]
def flag_for_review(response):
for pattern in HALLUCINATION_SIGNALS:
if re.search(pattern, response, re.IGNORECASE):
return True # Needs verification
return False
Fix 7: Sandboxed Code Verification
For agents that generate and execute code — always execute in sandbox first:
sandbox:
enabled: true
pre_execution_check: true
timeout_ms: 10000
on_error:
action: report_and_retry
max_retries: 2
provide_error_to_agent: true
Never execute agent-generated code directly in production. Test in sandbox, check output, then apply if correct.
Recovering from a Hallucination Incident
When you discover an agent acted on hallucinated information:
- Stop the agent — don’t let it continue building on the false foundation
- Identify the scope — what decisions were made after the hallucination?
- Check for side effects — were any external systems, files, or APIs modified?
- Restart with correction — provide the correct information explicitly in the new session prompt
- Add it to your verification checklist — what check would have caught this?
Common Questions
How do I tell if an agent output is hallucinated?
Check if the claim is verifiable from the context the agent had access to. If the agent says “function X exists in file Y” — verify it. If the agent says “you previously asked for Z” — check the conversation history. Anything specific and unverifiable deserves a second look.
Should I use RAG to prevent hallucinations?
Retrieval-Augmented Generation helps with factual grounding but doesn’t prevent all hallucinations. The agent can still misinterpret retrieved documents or generate incorrect code. RAG + verification pipeline is more robust than RAG alone.
How much does hallucination prevention cost in tokens?
A verification pipeline adds roughly 20–40% overhead. But a single hallucination caught early is worth more than that — fixing a corrupted database or wrong deployment costs 10–100x more tokens than the verification would have.
← View all hallucination solutions
Related guides:
- Context Window Errors — truncation can cause hallucinations
- Loop / Stuck Errors — hallucination-driven loops
- Token Saving Guide — verification is cheaper than correction
Catch hallucinations before they cause damage
SynapseAI includes verification patterns and hallucination detection for common agent output types.
clawhub install synapse-ai