Agent Fabricates Intermediate Reasoning Steps — Chain of Thought Is Wrong
Symptom
- Agent’s explanation of “how I got this answer” contains made-up steps
- Math reasoning looks correct but intermediate calculations are wrong
- Logical chain is internally consistent but based on a false premise stated as fact
- Agent says “Since X is 42, therefore Y” but X is not 42 — it’s fabricated
- Code reasoning: agent explains what a function does incorrectly
- Users trust wrong answers because they came with confident step-by-step reasoning
- Asking “how did you get that?” produces a plausible-sounding but fabricated explanation
Root Cause
Chain-of-thought prompting improves reasoning on genuine logical tasks but doesn’t prevent fabrication of the facts that reasoning operates on. The model generates a plausible chain of steps that leads to a coherent conclusion — but “plausible” doesn’t mean “correct”. The model interpolates facts it doesn’t know rather than saying “I don’t have this data.” The fix is to verify individual reasoning steps using tools (calculators, database lookups, code execution) rather than trusting the model’s self-reported steps.
Fix
Option 1: Tool-grounded reasoning — force each step to use a tool
import anthropic
import json
import logging
from typing import Any
logger = logging.getLogger(__name__)
client = anthropic.Anthropic()
# Instead of letting the model reason in its head, give it tools for each step:
GROUNDED_REASONING_TOOLS = [
{
"name": "calculate",
"description": (
"Evaluate a mathematical expression. Use this for ALL numeric calculations. "
"Never calculate in your head — always use this tool for math."
),
"input_schema": {
"type": "object",
"properties": {
"expression": {"type": "string", "description": "Python math expression to evaluate, e.g. '(42 * 3.14) / 100'"},
"description": {"type": "string", "description": "What this calculation is for"}
},
"required": ["expression"]
}
},
{
"name": "lookup_fact",
"description": (
"Look up a specific fact from the knowledge base. "
"Use this when you need a number, date, name, or other fact from the data. "
"Never guess or assume a fact — always look it up."
),
"input_schema": {
"type": "object",
"properties": {
"fact_query": {"type": "string", "description": "What fact to look up"}
},
"required": ["fact_query"]
}
},
{
"name": "verify_logical_step",
"description": (
"Verify whether a logical inference is valid. "
"State the premise and the conclusion. Returns whether the step is logically valid."
),
"input_schema": {
"type": "object",
"properties": {
"premise": {"type": "string"},
"conclusion": {"type": "string"},
"reasoning": {"type": "string"}
},
"required": ["premise", "conclusion", "reasoning"]
}
}
]
def execute_calculate(expression: str) -> str:
"""Safely evaluate a math expression."""
try:
import math
# Only allow safe operations:
allowed_names = {k: getattr(math, k) for k in dir(math) if not k.startswith("_")}
allowed_names.update({"abs": abs, "round": round, "min": min, "max": max, "sum": sum})
result = eval(expression, {"__builtins__": {}}, allowed_names)
return f"{result}"
except Exception as e:
return f"Error evaluating '{expression}': {e}"
def execute_lookup(query: str, knowledge_base: dict) -> str:
"""Look up a fact from the knowledge base."""
query_lower = query.lower()
for key, value in knowledge_base.items():
if any(word in query_lower for word in key.lower().split()):
return f"{key}: {value}"
return f"Fact not found in knowledge base for query: '{query}'"
def grounded_reasoning_call(
question: str,
knowledge_base: dict | None = None,
model: str = "claude-sonnet-4-6"
) -> dict:
"""
Answer a question using tool-grounded reasoning.
Every calculation uses the calculate tool. Every fact lookup uses lookup_fact.
No step is taken purely in the model's head.
"""
kb = knowledge_base or {}
messages = [{
"role": "user",
"content": (
f"{question}\n\n"
"Think through this step by step. For every calculation, use the calculate tool. "
"For every fact you need from the data, use lookup_fact. "
"Do not calculate or recall facts from memory — use the tools."
)
}]
reasoning_steps = []
while True:
response = client.messages.create(
model=model,
max_tokens=2048,
tools=GROUNDED_REASONING_TOOLS,
messages=messages
)
tool_calls = [b for b in response.content if b.type == "tool_use"]
if not tool_calls:
# Model is done — extract final answer
final_text = next((b.text for b in response.content if b.type == "text"), "")
return {"answer": final_text, "reasoning_steps": reasoning_steps}
# Execute each tool call:
tool_results = []
for tool_call in tool_calls:
name = tool_call.name
inp = tool_call.input
if name == "calculate":
result = execute_calculate(inp["expression"])
step = f"CALCULATE: {inp['expression']} = {result} ({inp.get('description', '')})"
elif name == "lookup_fact":
result = execute_lookup(inp["fact_query"], kb)
step = f"LOOKUP: {inp['fact_query']} → {result}"
elif name == "verify_logical_step":
result = "Logical step accepted for review."
step = f"VERIFY: '{inp['premise']}' → '{inp['conclusion']}'"
else:
result = "Unknown tool"
step = f"UNKNOWN TOOL: {name}"
reasoning_steps.append(step)
tool_results.append({
"type": "tool_result",
"tool_use_id": tool_call.id,
"content": result
})
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": tool_results})
# Usage:
result = grounded_reasoning_call(
question="If revenue grew 23% from Q1 to Q2, and Q1 revenue was $1.2M, what is Q2 revenue?",
knowledge_base={"Q1 revenue": "$1,200,000", "growth rate Q1-Q2": "23%"}
)
print(result["answer"])
# All calculations went through the calculate tool — no fabricated arithmetic
for step in result["reasoning_steps"]:
print(f" {step}")
Option 2: Step validation — verify each reasoning step before the next
import anthropic
import json
import logging
from typing import Any
logger = logging.getLogger(__name__)
client = anthropic.Anthropic()
def generate_reasoning_steps(question: str, model: str = "claude-sonnet-4-6") -> list[str]:
"""Generate step-by-step reasoning as a list of individual steps."""
response = client.messages.create(
model=model,
max_tokens=2048,
messages=[{
"role": "user",
"content": (
f"Answer this question step by step:\n{question}\n\n"
"Format your response as a numbered list where each step is a single claim or calculation. "
"One step per line. Be explicit about what each step claims as fact."
)
}]
)
text = response.content[0].text
steps = [line.strip() for line in text.split("\n") if line.strip() and line.strip()[0].isdigit()]
return steps
def verify_step(
step: str,
context: str,
facts: dict | None = None
) -> dict:
"""
Verify whether a single reasoning step is valid.
Returns {valid: bool, issue: str | None, verified_claim: str}.
"""
facts_text = json.dumps(facts, indent=2) if facts else "No facts provided"
response = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=256,
messages=[{
"role": "user",
"content": (
f"Available facts:\n{facts_text}\n\n"
f"Context (previous steps):\n{context}\n\n"
f"Step to verify: {step!r}\n\n"
"Is this step factually correct and logically valid given the available facts? "
"Return JSON: {\"valid\": true/false, \"issue\": \"description of issue or null\", "
"\"confidence\": 0.0-1.0}"
)
}]
)
try:
return json.loads(response.content[0].text.strip().strip("```json").strip("```"))
except json.JSONDecodeError:
return {"valid": True, "issue": None, "confidence": 0.5}
def verified_chain_of_thought(
question: str,
facts: dict | None = None,
halt_on_invalid: bool = True
) -> dict:
"""
Generate reasoning steps and verify each one before accepting it.
If a step is invalid, halt or flag the issue.
"""
steps = generate_reasoning_steps(question)
validated_steps = []
issues = []
context = ""
for step in steps:
verification = verify_step(step, context, facts)
if verification.get("valid", True):
validated_steps.append({"step": step, "verified": True})
context += f"\n{step}"
else:
issue = verification.get("issue", "Unknown issue")
logger.warning(f"Invalid step detected: {step!r} — {issue}")
issues.append({"step": step, "issue": issue})
if halt_on_invalid:
return {
"completed": False,
"reason": f"Reasoning chain contains an unverified step: {issue}",
"validated_steps": validated_steps,
"failed_step": step,
"issues": issues
}
else:
validated_steps.append({"step": step, "verified": False, "issue": issue})
context += f"\n{step}"
return {
"completed": True,
"validated_steps": validated_steps,
"issues": issues,
"all_valid": len(issues) == 0
}
# Usage:
result = verified_chain_of_thought(
question="Calculate the compound annual growth rate for revenue from $1M to $1.5M over 3 years.",
facts={"initial_revenue": 1_000_000, "final_revenue": 1_500_000, "years": 3},
halt_on_invalid=False
)
for step in result["validated_steps"]:
status = "✓" if step["verified"] else "✗"
print(f"{status} {step['step']}")
Option 3: Code execution for math — verify arithmetic by running it
import anthropic
import ast
import logging
from typing import Any
logger = logging.getLogger(__name__)
client = anthropic.Anthropic()
def extract_calculations_from_text(text: str) -> list[dict]:
"""
Extract mathematical expressions from reasoning text.
Returns list of {expression, claimed_result, location}.
"""
import re
patterns = [
# "42 × 3 = 126" or "42 * 3 = 126"
r"([\d.,]+)\s*[×x*]\s*([\d.,]+)\s*=\s*([\d.,]+)",
# "100 / 4 = 25"
r"([\d.,]+)\s*/\s*([\d.,]+)\s*=\s*([\d.,]+)",
# "1200 + 350 = 1550"
r"([\d.,]+)\s*[+]\s*([\d.,]+)\s*=\s*([\d.,]+)",
# "1550 - 200 = 1350"
r"([\d.,]+)\s*[-]\s*([\d.,]+)\s*=\s*([\d.,]+)",
]
found = []
for pattern in patterns:
for match in re.finditer(pattern, text):
groups = match.groups()
if len(groups) == 3:
try:
a = float(groups[0].replace(",", ""))
b = float(groups[1].replace(",", ""))
claimed = float(groups[2].replace(",", ""))
found.append({
"text": match.group(0),
"operands": (a, b),
"claimed_result": claimed,
"position": match.start()
})
except ValueError:
pass
return found
def verify_arithmetic_in_text(reasoning_text: str) -> dict:
"""
Find and verify all arithmetic in the reasoning text.
Returns {all_correct: bool, errors: list}.
"""
calculations = extract_calculations_from_text(reasoning_text)
errors = []
for calc in calculations:
a, b = calc["operands"]
claimed = calc["claimed_result"]
original = calc["text"]
# Determine operator and compute actual result:
if "×" in original or "x" in original or "*" in original:
actual = a * b
elif "/" in original:
actual = a / b if b != 0 else float("inf")
elif "+" in original:
actual = a + b
elif "-" in original:
actual = a - b
else:
continue
tolerance = abs(actual) * 0.001 # 0.1% tolerance for floating point
if abs(actual - claimed) > max(tolerance, 0.01):
errors.append({
"expression": original,
"claimed": claimed,
"actual": actual,
"error": abs(actual - claimed)
})
logger.warning(f"Arithmetic error: {original} — actual result is {actual}")
return {
"all_correct": len(errors) == 0,
"checked": len(calculations),
"errors": errors
}
def answer_with_arithmetic_verification(question: str) -> dict:
"""
Generate an answer and verify all arithmetic in the reasoning."""
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
messages=[{
"role": "user",
"content": (
f"{question}\n\n"
"Show your calculations explicitly in the format: A × B = C, A + B = C, etc."
)
}]
)
reasoning = response.content[0].text
verification = verify_arithmetic_in_text(reasoning)
if not verification["all_correct"]:
errors = verification["errors"]
correction_prompt = (
f"Your previous answer contained arithmetic errors:\n"
+ "\n".join(f"- {e['expression']}: actual result is {e['actual']}" for e in errors)
+ "\n\nPlease correct the calculation and provide the right answer."
)
corrected_response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[
{"role": "user", "content": question},
{"role": "assistant", "content": reasoning},
{"role": "user", "content": correction_prompt}
]
)
return {
"answer": corrected_response.content[0].text,
"corrected": True,
"original_errors": errors
}
return {"answer": reasoning, "corrected": False, "original_errors": []}
Option 4: Reasoning trace audit — ask the model to critique its own chain
import anthropic
import json
import logging
logger = logging.getLogger(__name__)
client = anthropic.Anthropic()
def generate_then_audit(
question: str,
facts: dict | None = None,
model: str = "claude-sonnet-4-6"
) -> dict:
"""
Generate an answer, then have the model audit its own reasoning.
Self-critique catches fabricated steps more reliably than a single pass.
"""
# Step 1: Generate initial answer with reasoning
context = f"Facts: {json.dumps(facts)}\n\n" if facts else ""
initial_response = client.messages.create(
model=model,
max_tokens=2048,
messages=[{
"role": "user",
"content": f"{context}Question: {question}\n\nAnswer step by step, showing your reasoning."
}]
)
initial_answer = initial_response.content[0].text
# Step 2: Self-audit — ask a fresh call to critique the reasoning
audit_response = client.messages.create(
model=model,
max_tokens=1024,
messages=[{
"role": "user",
"content": (
f"Question: {question}\n\n"
f"Available facts: {json.dumps(facts) if facts else 'none'}\n\n"
f"Proposed answer:\n{initial_answer}\n\n"
"Audit this answer:\n"
"1. Are all intermediate facts stated as true actually verifiable from the provided facts?\n"
"2. Is each logical step valid given what came before?\n"
"3. Are all arithmetic calculations correct?\n"
"4. Does the final answer follow from the reasoning?\n\n"
"Return JSON: {\"passes_audit\": true/false, \"issues\": [\"issue 1\", ...], "
"\"corrected_answer\": \"corrected answer or null if no correction needed\"}"
)
}]
)
try:
audit = json.loads(audit_response.content[0].text.strip().strip("```json").strip("```"))
except json.JSONDecodeError:
audit = {"passes_audit": True, "issues": [], "corrected_answer": None}
if audit.get("passes_audit"):
return {"answer": initial_answer, "audited": True, "issues": []}
else:
corrected = audit.get("corrected_answer") or initial_answer
logger.warning(f"Reasoning audit found issues: {audit.get('issues', [])}")
return {
"answer": corrected,
"audited": True,
"issues": audit.get("issues", []),
"was_corrected": bool(audit.get("corrected_answer"))
}
Option 5: Separate fact-retrieval from reasoning — don’t mix them
import anthropic
import json
import logging
from typing import Any
logger = logging.getLogger(__name__)
client = anthropic.Anthropic()
def two_phase_reasoning(
question: str,
data_retrieval_fn, # Callable that retrieves actual data
model: str = "claude-sonnet-4-6"
) -> dict:
"""
Phase 1: Identify what facts are needed (model)
Phase 2: Retrieve those facts from the actual data source (tool/DB)
Phase 3: Reason using only the retrieved facts (model, no invention)
This prevents the model from inventing facts in phase 3 because all
relevant facts are explicitly provided.
"""
# Phase 1: Identify required facts
facts_needed_response = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=512,
messages=[{
"role": "user",
"content": (
f"To answer this question: {question!r}\n\n"
"List every specific fact or data point you would need to look up. "
"Return JSON: {\"facts_needed\": [\"fact 1\", \"fact 2\", ...]}"
)
}]
)
try:
needed = json.loads(
facts_needed_response.content[0].text.strip().strip("```json").strip("```")
).get("facts_needed", [])
except (json.JSONDecodeError, AttributeError):
needed = []
# Phase 2: Retrieve actual facts
retrieved_facts = {}
for fact_query in needed:
try:
result = data_retrieval_fn(fact_query)
retrieved_facts[fact_query] = result
except Exception as exc:
retrieved_facts[fact_query] = f"Not available: {exc}"
# Phase 3: Reason using only retrieved facts — no additional lookups allowed
facts_text = json.dumps(retrieved_facts, indent=2)
answer_response = client.messages.create(
model=model,
max_tokens=2048,
system=(
"You must answer using ONLY the facts provided below. "
"Do not use any additional knowledge or assumptions. "
"If a needed fact is marked 'Not available', state that you cannot complete that step."
),
messages=[{
"role": "user",
"content": (
f"Facts retrieved from the data source:\n{facts_text}\n\n"
f"Question: {question}\n\n"
"Answer step by step using ONLY the facts listed above."
)
}]
)
return {
"answer": answer_response.content[0].text,
"facts_used": retrieved_facts,
"facts_needed": needed
}
Option 6: Reasoning mode selection — only use chain-of-thought when it helps
import anthropic
import json
import logging
logger = logging.getLogger(__name__)
client = anthropic.Anthropic()
QUESTION_TYPES = {
"factual_lookup": {
"description": "Looking up a specific fact (date, name, number)",
"approach": "direct_answer",
"risk": "high_fabrication_risk_in_cot"
},
"math_calculation": {
"description": "Numerical calculation",
"approach": "tool_use_calculate",
"risk": "arithmetic_can_be_fabricated"
},
"logical_deduction": {
"description": "Pure logic from given premises",
"approach": "chain_of_thought",
"risk": "low_if_premises_are_provided"
},
"summarization": {
"description": "Summarizing provided text",
"approach": "direct_answer",
"risk": "low"
}
}
def classify_question_type(question: str) -> str:
"""Classify what type of reasoning the question requires."""
response = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=64,
messages=[{
"role": "user",
"content": (
f"Classify this question: {question!r}\n\n"
f"Types: {', '.join(QUESTION_TYPES.keys())}\n\n"
"Return the type name only."
)
}]
)
answer = response.content[0].text.strip().lower()
return answer if answer in QUESTION_TYPES else "logical_deduction"
def answer_with_appropriate_reasoning(
question: str,
facts: dict | None = None
) -> str:
q_type = classify_question_type(question)
config = QUESTION_TYPES.get(q_type, QUESTION_TYPES["logical_deduction"])
logger.info(f"Question type: {q_type} (approach: {config['approach']})")
if config["approach"] == "tool_use_calculate":
result = grounded_reasoning_call(question, facts) # Option 1 function
return result["answer"]
elif config["approach"] == "direct_answer":
# No chain-of-thought — reduces fabrication of intermediate steps
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=512,
system="Answer directly and concisely. Do not show intermediate steps.",
messages=[{"role": "user", "content": question}]
)
return response.content[0].text
else: # chain_of_thought for logical deduction
facts_text = f"\nFacts:\n{json.dumps(facts, indent=2)}" if facts else ""
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
messages=[{
"role": "user",
"content": f"{facts_text}\n\nQuestion: {question}\n\nReason step by step from the given facts only."
}]
)
return response.content[0].text
Reasoning Reliability by Approach
| Approach | Fabrication Risk | Best For | Verification Cost |
|---|---|---|---|
| Tool-grounded (calculator, DB) | Very low | Math, data lookup | Medium |
| Step-by-step verification | Low | Complex logic | High |
| Arithmetic extraction + check | Low | Number-heavy reasoning | Low |
| Self-audit loop | Medium | General reasoning | Medium |
| Two-phase fact retrieval | Low | Data-dependent reasoning | Medium |
| Direct answer (no CoT) | Medium | Simple factual questions | None |
When Chain-of-Thought Makes Fabrication Worse
- Factual lookups: CoT gives the model more space to invent supporting “facts”
- Arithmetic: Model generates plausible-looking but wrong intermediate steps
- Historical data: Model fills in unknown intermediate dates/numbers confidently
When Chain-of-Thought Genuinely Helps (and Is Lower Risk)
- Pure logical deduction from stated premises (no external facts needed)
- Code debugging where the model reasons about provided code only
- Multi-step planning with explicit constraints all given upfront
Expected Token Savings
Fabricated reasoning → user gets wrong answer → asks for correction → agent re-does task: ~3,000 tokens overhead Grounded reasoning → correct first time: 0 correction overhead Plus: fabricated arithmetic in a financial agent can cost far more than tokens
Environment
- Any agent performing multi-step reasoning that users rely on for decisions (financial calculations, medical reasoning, legal analysis, engineering estimates); chain-of-thought fabrication is highest risk when: the model lacks the required facts (it invents them), arithmetic is involved (it calculates incorrectly), or the question is about specific data rather than pure logic — use tools for anything that can be verified externally
- Source: direct experience; “the reasoning looked convincing but the intermediate steps were wrong” is the hardest category of hallucination to catch without a verification layer, because the final answer is often approximately correct even when intermediate steps are fabricated
Wasting tokens on this error?
Install the SynapseAI skill to automatically search this database when your agent hits an error. Average savings: $2–5 per error incident.
clawhub install synapse-ai
Solved an error that's not here?
Share it and earn MoltCoin rewards.