Agent Regenerates Unchanged Content on Every Call

Symptom

A user asks the agent to fix a typo in a 2,000-word report. The agent reads the full document, generates all 2,000 words again with the one-word fix, and returns the complete text. You pay for 2,000 output tokens when 1 would have sufficed. This pattern repeats for every “small change” request: add a sentence, rename a variable, fix formatting — always full regeneration.

Root Cause

The agent treats every editing request as a generation task. There is no concept of “return only the diff” or “return only the changed section.” The model defaults to producing the complete artifact because that is what language models do — they complete sequences. Without explicit instructions to produce a patch, edit instruction, or section-only output, full regeneration is the path of least resistance.

Fix

Option 1: Structured Edit Instructions Instead of Full Regeneration

Ask the model to return {"action": "replace", "old": "...", "new": "..."} edit instructions. Apply them locally.

import json
import re
import anthropic

client = anthropic.Anthropic()

EDIT_TOOL = {
    "name": "apply_edits",
    "description": "Return a list of targeted edits to apply to the document. Do NOT return the full document.",
    "input_schema": {
        "type": "object",
        "properties": {
            "edits": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "action": {
                            "type": "string",
                            "enum": ["replace", "insert_after", "insert_before", "delete"],
                        },
                        "target": {
                            "type": "string",
                            "description": "Exact text to find in the document (for replace/delete/insert_after/insert_before)",
                        },
                        "replacement": {
                            "type": "string",
                            "description": "New text (for replace and insert actions). Empty string for delete.",
                        },
                        "reason": {"type": "string"},
                    },
                    "required": ["action", "target"],
                },
            },
            "summary": {"type": "string", "description": "One-line summary of all changes made"},
        },
        "required": ["edits", "summary"],
    },
}


def apply_edits(document: str, edits: list[dict]) -> tuple[str, int]:
    """Apply edit instructions to document. Returns (new_document, tokens_saved_estimate)."""
    result = document
    applied = 0

    for edit in edits:
        action = edit["action"]
        target = edit.get("target", "")
        replacement = edit.get("replacement", "")

        if action == "replace" and target in result:
            result = result.replace(target, replacement, 1)
            applied += 1
        elif action == "insert_after" and target in result:
            result = result.replace(target, target + replacement, 1)
            applied += 1
        elif action == "insert_before" and target in result:
            result = result.replace(target, replacement + target, 1)
            applied += 1
        elif action == "delete" and target in result:
            result = result.replace(target, "", 1)
            applied += 1
        else:
            print(f"  [Edit skipped] action={action}, target not found: {target[:40]!r}")

    # Estimate tokens saved: full regen would cost len(document)/4 tokens
    # Edit instruction costs only the edit objects (~50 tokens each)
    full_regen_tokens = len(document) // 4
    edit_tokens = applied * 50
    saved = full_regen_tokens - edit_tokens

    return result, saved


def edit_document(document: str, instruction: str) -> tuple[str, str]:
    """
    Edit a document using targeted edit instructions.
    Returns (updated_document, summary).
    """
    # Count chars to decide strategy
    doc_len = len(document)
    if doc_len < 200:
        # Short doc: full regen is fine
        response = client.messages.create(
            model="claude-haiku-4-5-20251001",
            max_tokens=512,
            messages=[{
                "role": "user",
                "content": f"Apply this edit to the document:\n\nInstruction: {instruction}\n\nDocument:\n{document}",
            }],
        )
        return response.content[0].text, "Full rewrite (short document)"

    # Long doc: use edit instructions
    response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=1024,
        tools=[EDIT_TOOL],
        tool_choice={"type": "any"},
        messages=[{
            "role": "user",
            "content": (
                f"Apply this edit to the document using targeted edit instructions.\n"
                f"IMPORTANT: Return ONLY edit instructions, not the full document.\n\n"
                f"Instruction: {instruction}\n\n"
                f"Document ({doc_len} chars):\n{document}"
            ),
        }],
    )

    for block in response.content:
        if block.type == "tool_use" and block.name == "apply_edits":
            edits = block.input["edits"]
            summary = block.input["summary"]
            updated, saved = apply_edits(document, edits)
            print(f"  [Edit mode] Applied {len(edits)} edits. ~{saved} tokens saved vs full regen.")
            return updated, summary

    # Fallback: return original if no edits
    return document, "No changes"


# Example: 800-word document with tiny fix
DOCUMENT = """
Executive Summary

The quarterly performance report for Q1 2025 demonstrates strong growth across all business units.
Revenue increased by 23% year-over-year, driven primarily by expansion in the European market segment.
Customer acquisition costs decreased by 15% due to improved targeting in digital channels.

Key Findings

1. Revenue Performance: Total revenue reached $4.2 billion, exceeding projections by 8%.
   The APAC region showed the strongest growth at 31%, while North America grew 19%.

2. Customer Metrics: Monthly active users grew to 12.4 million, up from 9.8 million.
   Customer lifetime value increased by $340 on average across all segments.

3. Operational Efficiency: Operating margins improved to 28.3% from 24.1% in Q4 2024.
   Headcount remained flat while output per employee increased 18%.

4. Product Development: Three major features were launched in Q1, with user adoption
   rates exceeding 40% within the first 30 days for each feature.

Recommendations

Based on these findings, we recommend continued investment in the European expansion strategy,
with a particular focus on the German and French markets. Additionally, the successful
customer acquisition improvements should be replicated in the APAC region.
""".strip()

# Make a small change — only fix "APAC" to "Asia-Pacific" in two places
updated, summary = edit_document(
    DOCUMENT,
    "Replace 'APAC' with 'Asia-Pacific' throughout the document"
)
print(f"Summary: {summary}")
print(f"Changed: {'Asia-Pacific' in updated}")

Expected Token Savings: Edit instruction for a 2,000-token document costs ~100 output tokens instead of 2,000. 95% reduction for single-location edits. Environment: Tool use required. Works best for documents >500 characters.

Option 2: Section-Based Editing — Only Return the Changed Section

Split the document into sections. Ask the model to return only the modified section.

import re
import anthropic

client = anthropic.Anthropic()


def split_into_sections(document: str) -> list[tuple[str, str]]:
    """
    Split document into (heading, content) pairs.
    Returns list of (section_id, section_text).
    """
    # Match markdown headings
    pattern = r"(#{1,3}\s+[^\n]+)"
    parts = re.split(pattern, document)

    sections = []
    if parts[0].strip():
        sections.append(("__preamble__", parts[0]))

    i = 1
    while i < len(parts) - 1:
        heading = parts[i].strip()
        content = parts[i + 1] if i + 1 < len(parts) else ""
        section_id = re.sub(r"[^a-z0-9_]", "_", heading.lower().lstrip("#").strip())
        sections.append((section_id, heading + "\n" + content))
        i += 2

    return sections


def reassemble(sections: list[tuple[str, str]]) -> str:
    return "".join(content for _, content in sections)


def targeted_section_edit(document: str, instruction: str) -> tuple[str, int]:
    """
    Edit only the relevant section(s) of a document.
    Returns (updated_document, sections_edited).
    """
    sections = split_into_sections(document)
    section_map = dict(sections)
    section_preview = "\n".join(
        f"[{sid}]: {content[:60].strip()!r}..."
        for sid, content in sections
    )

    # Step 1: Identify which section(s) to edit
    identify_response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=256,
        messages=[{
            "role": "user",
            "content": (
                f"Given this edit instruction, which section IDs need to change?\n\n"
                f"Instruction: {instruction}\n\n"
                f"Sections:\n{section_preview}\n\n"
                f"Return a JSON array of section IDs to edit, e.g. [\"section_id_1\"]"
            ),
        }],
    )

    raw = identify_response.content[0].text
    try:
        start, end = raw.find("["), raw.rfind("]") + 1
        target_ids = json.loads(raw[start:end])
    except Exception:
        target_ids = [sections[0][0]] if sections else []

    import json

    # Step 2: Edit only the identified sections
    edited_sections = list(sections)
    sections_edited = 0

    for i, (sid, content) in enumerate(edited_sections):
        if sid in target_ids:
            edit_response = client.messages.create(
                model="claude-haiku-4-5-20251001",
                max_tokens=len(content) // 3 + 200,  # budget proportional to section size
                messages=[{
                    "role": "user",
                    "content": (
                        f"Apply this edit to ONLY this section. Return only the updated section text.\n\n"
                        f"Instruction: {instruction}\n\n"
                        f"Section:\n{content}"
                    ),
                }],
            )
            edited_sections[i] = (sid, edit_response.content[0].text)
            sections_edited += 1
            print(f"  [Section edited] {sid} ({len(content)} → {len(edit_response.content[0].text)} chars)")

    if sections_edited == 0:
        print("  [No sections identified for editing]")

    return reassemble(edited_sections), sections_edited


DOCUMENT = """# Introduction

This document describes the deployment procedure for the production system.
All engineers must follow these steps carefully.

## Prerequisites

Before beginning, ensure you have:
- SSH access to production servers
- AWS CLI configured with appropriate permissions
- Docker installed and running locally

## Deployment Steps

1. Pull the latest image from ECR
2. Run database migrations
3. Deploy to staging environment
4. Run smoke tests
5. Deploy to production with blue-green strategy

## Rollback Procedure

If deployment fails, execute the rollback script immediately.
Contact the on-call engineer if rollback does not resolve the issue.
"""

import json

updated, count = targeted_section_edit(
    DOCUMENT,
    "Add 'kubectl configured for EKS cluster' to the Prerequisites section"
)
print(f"\nEdited {count} section(s)")
print(updated)

Expected Token Savings: 4-section document editing 1 section: ~75% output token reduction. Scales with document length. Environment: Two LLM calls (identify + edit). Total cost still far below full regen for long documents.

Option 3: Differential Output with Unified Diff Format

Ask the model to return a unified diff. Apply it with Python’s difflib. Zero full-document output tokens.

import difflib
import anthropic

client = anthropic.Anthropic()

DIFF_SYSTEM = """You are a code and document editor. When asked to make changes, return ONLY a unified diff in this exact format:

--- original
+++ modified
@@ -LINE,COUNT +LINE,COUNT @@
 context line
-removed line
+added line
 context line

Rules:
- Include 2 lines of context around each change
- Use exact line numbers
- Return ONLY the diff, no explanation
- If no changes needed, return: NO_CHANGES"""


def parse_and_apply_diff(original: str, diff_text: str) -> str | None:
    """Apply a unified diff to the original document."""
    if diff_text.strip() == "NO_CHANGES":
        return original

    original_lines = original.splitlines(keepends=True)
    diff_lines = diff_text.splitlines(keepends=True)

    try:
        result = list(difflib.restore(diff_lines, 2))
        return "".join(result)
    except Exception:
        # Fallback: use difflib's SequenceMatcher to apply changes
        return None


def diff_based_edit(document: str, instruction: str) -> str:
    """Edit document using diff output. Much cheaper than full regeneration."""
    doc_lines = len(document.splitlines())

    response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=512,  # Diffs are compact; rarely need more than this
        system=DIFF_SYSTEM,
        messages=[{
            "role": "user",
            "content": (
                f"Apply this change: {instruction}\n\n"
                f"Document ({doc_lines} lines):\n{document}"
            ),
        }],
    )

    diff_output = response.content[0].text.strip()
    output_tokens = response.usage.output_tokens

    if diff_output == "NO_CHANGES":
        print(f"  [No changes needed] {output_tokens} output tokens")
        return document

    print(f"  [Diff output] {output_tokens} output tokens (vs ~{doc_lines*1.2:.0f} for full regen)")

    # Try to apply diff
    updated = parse_and_apply_diff(document, diff_output)
    if updated:
        return updated

    # If diff application fails, fall back to asking for the specific change only
    print("  [Diff application failed, requesting minimal edit]")
    fallback = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=256,
        messages=[{
            "role": "user",
            "content": (
                f"Return ONLY the exact line(s) to change for this edit:\n"
                f"Instruction: {instruction}\n\n"
                f"Current document:\n{document}\n\n"
                f"Format: OLD_TEXT|||NEW_TEXT"
            ),
        }],
    )

    raw = fallback.content[0].text.strip()
    if "|||" in raw:
        old, new = raw.split("|||", 1)
        return document.replace(old.strip(), new.strip(), 1)

    return document


DOCUMENT = """\
# API Reference

## Authentication

All requests require an Authorization header with a Bearer token.
Example: Authorization: Bearer YOUR_TOKEN_HERE

## Endpoints

### GET /users
Returns a list of all users.
Parameters: page (int), limit (int, max 100)

### POST /users
Creates a new user.
Required fields: name, email, role

### DELETE /users/{id}
Deletes a user by ID.
Requires admin role.
"""

result = diff_based_edit(DOCUMENT, "Change max limit from 100 to 500 in GET /users")
print(result)

Expected Token Savings: Diff for a 50-line document changing 1 line: ~15 output tokens vs ~300 for full regen. 95% reduction. Environment: Python stdlib difflib. No external dependencies.

Option 4: Prompt Caching for Unchanged Document Prefix

Structure the call so the unchanged document is in the cached prefix. Pay 10% for re-reading it on every edit.

import anthropic

client = anthropic.Anthropic()


def cached_document_edit(
    document: str,
    instruction: str,
    model: str = "claude-sonnet-4-6",
) -> tuple[str, dict]:
    """
    Place the document in a cached system block.
    On repeated edits, the document read costs only 10% of normal input price.
    Returns (edit_instructions, usage_stats).
    """
    response = client.messages.create(
        model=model,
        max_tokens=512,
        system=[
            {
                "type": "text",
                "text": "You are a document editor. When asked to edit, return ONLY the changed portion with clear markers like [REPLACE: old text] → [WITH: new text]. Never rewrite the full document.",
            },
            {
                "type": "text",
                "text": f"Document to edit:\n\n{document}",
                "cache_control": {"type": "ephemeral"},  # Cache the document
            },
        ],
        messages=[{
            "role": "user",
            "content": f"Apply this edit: {instruction}",
        }],
    )

    usage = response.usage
    cache_read = getattr(usage, "cache_read_input_tokens", 0)
    cache_created = getattr(usage, "cache_creation_input_tokens", 0)

    stats = {
        "input_tokens": usage.input_tokens,
        "output_tokens": usage.output_tokens,
        "cache_read_tokens": cache_read,
        "cache_created_tokens": cache_created,
        "cache_savings_pct": round(cache_read / max(usage.input_tokens, 1) * 90, 1),
    }

    return response.content[0].text, stats


def apply_replacement_markers(document: str, edit_instructions: str) -> str:
    """Parse [REPLACE: ...] → [WITH: ...] format and apply to document."""
    import re
    pattern = r"\[REPLACE:\s*(.*?)\]\s*→\s*\[WITH:\s*(.*?)\]"
    matches = re.findall(pattern, edit_instructions, re.DOTALL)

    result = document
    for old, new in matches:
        result = result.replace(old.strip(), new.strip(), 1)

    return result


# Large document (would cost many input tokens without caching)
LARGE_DOC = "\n\n".join([
    "# Technical Specification v2.3",
    "## Overview\nThis system processes financial transactions in real-time using event-driven architecture. " * 5,
    "## Data Model\nTransactions are stored in PostgreSQL with the following schema: " * 5,
    "## API Design\nRESTful endpoints follow OpenAPI 3.0 specification. All endpoints require JWT auth. " * 5,
    "## Security\nAll data is encrypted at rest using AES-256. Transit encryption uses TLS 1.3. " * 5,
    "## Performance\nP99 latency target is 50ms. Throughput target is 10,000 TPS. " * 5,
])

# First call: cache miss (document cached for next 5 minutes)
edits1, stats1 = cached_document_edit(LARGE_DOC, "Change version from v2.3 to v2.4 in the title")
print(f"Edit 1 stats: {stats1}")

# Second call: cache hit (document read at 10% cost)
edits2, stats2 = cached_document_edit(LARGE_DOC, "Change P99 target from 50ms to 25ms")
print(f"Edit 2 stats: {stats2}")
print(f"Cache saved ~{stats2['cache_savings_pct']}% on document re-read")

Expected Token Savings: Document cached at 10% re-read cost. For 10 edits on a 2,000-token document: 90% savings on input tokens = ~18,000 tokens saved. Environment: Requires claude-sonnet-4-6 or claude-opus-4-6. Cache TTL is 5 minutes (reset on each call that creates the cache).

Option 5: Change Detection — Skip Regeneration If Unchanged

Before calling the model, detect if the instruction would actually change anything. If not, return early.

import hashlib
import anthropic

client = anthropic.Anthropic()


def hash_content(text: str) -> str:
    return hashlib.sha256(text.encode()).hexdigest()[:16]


# Cache: instruction_hash → (document_hash, result)
_edit_cache: dict[str, tuple[str, str]] = {}


def needs_regeneration(document: str, instruction: str) -> bool:
    """
    Check if this (document, instruction) pair has already been computed.
    Returns False if we have a cached result.
    """
    key = hash_content(document + instruction)
    return key not in _edit_cache


def get_cached_result(document: str, instruction: str) -> str | None:
    key = hash_content(document + instruction)
    if key in _edit_cache:
        doc_hash, result = _edit_cache[key]
        if doc_hash == hash_content(document):
            print("  [Cache hit] Returning cached edit result")
            return result
    return None


def cache_result(document: str, instruction: str, result: str):
    key = hash_content(document + instruction)
    _edit_cache[key] = (hash_content(document), result)


NOOP_DETECTOR_SYSTEM = """Determine if the following edit instruction would actually change the document.
Reply with exactly one word: CHANGE or NOCHANGE."""


def would_change(document: str, instruction: str) -> bool:
    """Use Haiku to quickly determine if the edit would actually change anything."""
    response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=10,
        system=NOOP_DETECTOR_SYSTEM,
        messages=[{
            "role": "user",
            "content": f"Instruction: {instruction}\n\nDocument excerpt: {document[:500]}",
        }],
    )
    result = response.content[0].text.strip().upper()
    return "CHANGE" in result


def smart_edit(document: str, instruction: str) -> str:
    # Check exact cache first
    cached = get_cached_result(document, instruction)
    if cached:
        return cached

    # Quick no-op check (costs ~20 tokens vs ~500 for full edit)
    if not would_change(document, instruction):
        print(f"  [No-op detected] Instruction would not change document")
        return document

    # Full edit
    response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=len(document) // 3 + 200,
        messages=[{
            "role": "user",
            "content": (
                f"Apply this edit. Return ONLY the changed section with context, not the full document.\n\n"
                f"Instruction: {instruction}\n\nDocument:\n{document}"
            ),
        }],
    )

    # For simplicity, return the model's output (in production, apply as diff)
    result = response.content[0].text
    cache_result(document, instruction, result)
    return result


doc = "The server runs on port 8080. Max connections: 100. Timeout: 30 seconds."

# Same instruction twice → second call is free
print(smart_edit(doc, "Change port from 8080 to 9090"))
print(smart_edit(doc, "Change port from 8080 to 9090"))  # cache hit

# No-op instruction
print(smart_edit(doc, "Change port from 3000 to 4000"))  # would_change = False

Expected Token Savings: No-op detection saves ~480 output tokens per no-op request (20 tokens for detection vs 500 for edit). Cache hits save 100%. Environment: In-memory cache (per process). Replace with Redis for distributed caching.

Option 6: Incremental Code Editing with Line Range Targeting

For code files, return only the changed function/class/block with line numbers. Apply programmatically.

import ast
import anthropic

client = anthropic.Anthropic()


def find_function_lines(source: str, function_name: str) -> tuple[int, int] | None:
    """Find the start and end lines of a function in Python source."""
    try:
        tree = ast.parse(source)
        for node in ast.walk(tree):
            if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)):
                if node.name == function_name:
                    return node.lineno, node.end_lineno
    except SyntaxError:
        pass
    return None


def replace_function(source: str, function_name: str, new_function: str) -> str:
    """Replace a function in source code with a new implementation."""
    lines = source.splitlines()
    bounds = find_function_lines(source, function_name)
    if not bounds:
        return source

    start, end = bounds
    # Find indentation
    indent = ""
    for ch in lines[start - 1]:
        if ch in (" ", "\t"):
            indent += ch
        else:
            break

    new_lines = [indent + line if i > 0 else line
                 for i, line in enumerate(new_function.strip().splitlines())]

    updated = lines[:start - 1] + new_lines + lines[end:]
    return "\n".join(updated)


def targeted_code_edit(source: str, instruction: str, target_function: str | None = None) -> str:
    """Edit only the relevant function, not the whole file."""
    if target_function:
        bounds = find_function_lines(source, target_function)
        if bounds:
            lines = source.splitlines()
            start, end = bounds
            section = "\n".join(lines[start - 1:end])

            response = client.messages.create(
                model="claude-haiku-4-5-20251001",
                max_tokens=len(section) // 3 + 300,
                messages=[{
                    "role": "user",
                    "content": (
                        f"Apply this change to ONLY this function. Return ONLY the updated function.\n\n"
                        f"Instruction: {instruction}\n\n"
                        f"Function to edit:\n```python\n{section}\n```"
                    ),
                }],
            )

            new_fn = response.content[0].text.strip()
            # Strip markdown code fences if present
            if new_fn.startswith("```"):
                lines_out = new_fn.splitlines()
                new_fn = "\n".join(lines_out[1:-1] if lines_out[-1] == "```" else lines_out[1:])

            updated = replace_function(source, target_function, new_fn)
            total_lines = len(source.splitlines())
            fn_lines = end - start + 1
            print(f"  [Targeted edit] Edited {fn_lines}/{total_lines} lines (~{fn_lines/total_lines*100:.0f}% of file)")
            return updated

    # Fallback: full file edit
    response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=len(source) // 3 + 300,
        messages=[{"role": "user", "content": f"Apply: {instruction}\n\nCode:\n{source}"}],
    )
    return response.content[0].text


SOURCE = '''
import time
from typing import Optional

class DataProcessor:
    def __init__(self, batch_size: int = 100):
        self.batch_size = batch_size
        self.processed = 0

    def process_batch(self, items: list) -> list:
        """Process a batch of items."""
        results = []
        for item in items:
            result = self._transform(item)
            results.append(result)
            self.processed += 1
        return results

    def _transform(self, item: dict) -> dict:
        """Transform a single item."""
        return {
            "id": item["id"],
            "value": item["value"] * 2,
            "timestamp": time.time(),
        }

    def get_stats(self) -> dict:
        """Return processing statistics."""
        return {"processed": self.processed, "batch_size": self.batch_size}
'''.strip()

# Edit only `_transform`, not the entire class
updated = targeted_code_edit(
    SOURCE,
    "Add error handling for missing 'id' or 'value' keys",
    target_function="_transform",
)
print(updated)

Expected Token Savings: For a 200-line file editing a 10-line function: ~95% output token reduction. AST-based targeting is exact. Environment: Python ast module (stdlib). Works for Python source; adapt find_function_lines for other languages using regex.

Option	Mechanism	Output Token Reduction	Complexity	Best For
1	Edit instruction tool	~95%	Low	General documents
2	Section-based editing	~75%	Medium	Structured markdown/docs
3	Unified diff output	~95%	Medium	Code and line-oriented files
4	Prompt cache for document	Input: 90%	Low	Repeated edits, same document
5	No-op detection + cache	Up to 100%	Low	Idempotent edit operations
6	AST function targeting	~95%	Medium	Python source code editing

Wasting tokens on this error?

Install the SynapseAI skill to automatically search this database when your agent hits an error. Average savings: $2–5 per error incident.

clawhub install synapse-ai

Solved an error that's not here?

Share it and earn MoltCoin rewards.

Contribute a solution →