Agent Uses Stale Few-Shot Examples — Outdated Patterns in Prompt
Symptom
- Agent generates code using deprecated API methods — the examples in the prompt use the old pattern
- Agent produces output in an old JSON schema that downstream consumers no longer accept
- Few-shot examples reference field names that were renamed 3 months ago
- Agent formats dates as
MM/DD/YYYYbecause the examples do — but the spec changed to ISO 8601 - Examples reference a library version (
requests2.x patterns) but the codebase upgraded tohttpx - Agent produces SQL for PostgreSQL 12 syntax — examples predate the migration to PostgreSQL 15
- Few-shot examples show error responses in the old format — new errors have a different structure
Root Cause
Few-shot examples are frozen text in a prompt. When the world changes — API versions, schemas, conventions, library upgrades — the examples don’t update automatically. The model learns from the examples it’s given; stale examples teach stale patterns. The fix is either to keep examples current through versioning and automated checks, or to replace static examples with dynamically retrieved ones that are verified against the current schema.
Fix
Option 1: Version-stamp examples and validate against current schema
import json
import hashlib
from dataclasses import dataclass, field
from pathlib import Path
from datetime import datetime
from typing import Optional
@dataclass
class FewShotExample:
"""
A versioned few-shot example with staleness tracking.
"""
id: str
description: str
input: str
output: str
schema_version: str # Which schema/API version this example targets
created_at: str # ISO 8601
validated_at: Optional[str] = None
deprecated: bool = False
deprecation_reason: Optional[str] = None
def content_hash(self) -> str:
return hashlib.sha256(f"{self.input}{self.output}".encode()).hexdigest()[:12]
class FewShotLibrary:
"""
Manages a library of versioned few-shot examples.
Filters examples to only those matching the current schema version.
Alerts when examples haven't been validated recently.
"""
def __init__(
self,
library_path: str = "prompts/few_shot_examples.json",
current_schema_version: str = "v2"
):
self.library_path = Path(library_path)
self.current_version = current_schema_version
self._examples: list[FewShotExample] = self._load()
def _load(self) -> list[FewShotExample]:
if self.library_path.exists():
raw = json.loads(self.library_path.read_text())
return [FewShotExample(**e) for e in raw.get("examples", [])]
return []
def get_current_examples(
self,
max_examples: int = 3,
warn_if_stale_days: int = 30
) -> list[FewShotExample]:
"""
Return examples valid for the current schema version.
Warns if examples haven't been validated recently.
"""
valid = [
e for e in self._examples
if e.schema_version == self.current_version
and not e.deprecated
]
if not valid:
print(
f"WARNING: No few-shot examples found for schema version '{self.current_version}'. "
f"Examples may be stale. Check {self.library_path}."
)
return []
# Warn on stale validation
now = datetime.utcnow()
for example in valid:
if example.validated_at:
age_days = (now - datetime.fromisoformat(example.validated_at)).days
if age_days > warn_if_stale_days:
print(
f"WARNING: Example '{example.id}' last validated {age_days} days ago. "
f"Re-validate against current schema."
)
return valid[:max_examples]
def deprecate_version(self, version: str, reason: str):
"""Mark all examples for an old version as deprecated"""
for example in self._examples:
if example.schema_version == version:
example.deprecated = True
example.deprecation_reason = reason
self._save()
print(f"Deprecated {sum(1 for e in self._examples if e.schema_version == version)} examples for version '{version}'")
def add_example(self, example: FewShotExample):
self._examples.append(example)
self._save()
def _save(self):
self.library_path.parent.mkdir(parents=True, exist_ok=True)
data = {"examples": [e.__dict__ for e in self._examples]}
self.library_path.write_text(json.dumps(data, indent=2))
def build_few_shot_block(self, max_examples: int = 3) -> str:
"""Format examples for inclusion in a system prompt"""
examples = self.get_current_examples(max_examples=max_examples)
if not examples:
return ""
lines = ["## Examples\n"]
for i, ex in enumerate(examples, 1):
lines.append(f"### Example {i}: {ex.description}")
lines.append(f"Input:\n{ex.input}")
lines.append(f"Output:\n{ex.output}\n")
return "\n".join(lines)
# Usage:
library = FewShotLibrary(current_schema_version="v3")
# When API/schema upgrades:
library.deprecate_version("v2", reason="API migrated from REST v2 to v3 on 2025-01-15")
library.add_example(FewShotExample(
id="create_order_v3",
description="Create order with new v3 schema",
input='Create an order for user 42 with items [{"sku": "A1", "qty": 2}]',
output='{"order": {"user_id": 42, "line_items": [{"sku": "A1", "quantity": 2}], "version": "v3"}}',
schema_version="v3",
created_at=datetime.utcnow().isoformat(),
validated_at=datetime.utcnow().isoformat()
))
system_prompt = f"""You are an order processing agent.
{library.build_few_shot_block(max_examples=3)}
Always produce output in the current v3 schema format shown above.
"""
Option 2: Dynamic few-shot retrieval — pull relevant, current examples at runtime
import anthropic
import json
from pathlib import Path
client = anthropic.Anthropic()
class DynamicFewShotRetriever:
"""
Instead of hardcoded examples, retrieve the most relevant examples
from a validated, versioned store at runtime.
Combines relevance (embedding similarity) with recency (prefer newer).
"""
def __init__(self, examples_dir: str = "prompts/examples/"):
self.examples_dir = Path(examples_dir)
self._cache: dict[str, dict] = {}
def _load_examples(self) -> list[dict]:
"""Load all example files — each file is one validated example"""
examples = []
for f in sorted(self.examples_dir.glob("*.json"), key=lambda p: p.stat().st_mtime, reverse=True):
try:
data = json.loads(f.read_text())
if not data.get("deprecated", False):
examples.append(data)
except Exception:
pass
return examples
async def retrieve_relevant(
self,
user_query: str,
max_examples: int = 3
) -> list[dict]:
"""
Use Claude to select the most relevant examples for the current query.
This avoids semantic search infrastructure while still being query-aware.
"""
all_examples = self._load_examples()
if not all_examples:
return []
if len(all_examples) <= max_examples:
return all_examples[:max_examples]
# Ask Claude to select the most relevant examples
example_summaries = "\n".join(
f"{i}. {ex['description']} (added: {ex.get('created_at', 'unknown')[:10]})"
for i, ex in enumerate(all_examples[:20]) # Limit to 20 candidates
)
selection_response = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=100,
messages=[{
"role": "user",
"content": (
f"User query: {user_query}\n\n"
f"Available examples:\n{example_summaries}\n\n"
f"Return the {max_examples} most relevant example numbers as JSON array. "
f"Example: [0, 3, 7]"
)
}]
)
try:
selected_indices = json.loads(selection_response.content[0].text)
return [all_examples[i] for i in selected_indices if i < len(all_examples)]
except Exception:
# Fallback to most recent
return all_examples[:max_examples]
def format_examples(self, examples: list[dict]) -> str:
"""Format examples for prompt inclusion"""
if not examples:
return ""
parts = ["Here are relevant examples:\n"]
for ex in examples:
parts.append(f"Example — {ex['description']}:")
parts.append(f"Input: {ex['input']}")
parts.append(f"Output: {ex['output']}\n")
return "\n".join(parts)
retriever = DynamicFewShotRetriever()
async def build_prompt_with_dynamic_examples(user_query: str) -> str:
examples = await retriever.retrieve_relevant(user_query, max_examples=3)
few_shot_block = retriever.format_examples(examples)
return f"""You are a data transformation agent.
{few_shot_block}
Follow the exact output format shown in the examples above.
"""
Option 3: Example validation CI — fail the build when examples diverge from schema
import json
import pytest
from pathlib import Path
from pydantic import BaseModel, ValidationError
from typing import Any
# Define the current expected output schema
class OrderLineItem(BaseModel):
sku: str
quantity: int # Note: was 'qty' in old schema — changed in v3
class OrderOutput(BaseModel):
order: dict # Or more specific nested model
class CreateOrderOutput(BaseModel):
"""Current v3 output schema"""
order_id: str
user_id: int
line_items: list[OrderLineItem]
status: str
version: str = "v3"
EXAMPLE_DIR = Path("prompts/examples/")
def load_all_examples() -> list[tuple[str, dict]]:
"""Load all example files with their filenames"""
examples = []
for f in EXAMPLE_DIR.glob("*.json"):
data = json.loads(f.read_text())
examples.append((f.name, data))
return examples
@pytest.mark.parametrize("filename,example", load_all_examples())
def test_example_output_matches_current_schema(filename: str, example: dict):
"""
CI test: every few-shot example's output must be valid under the current schema.
Fails immediately if an example drifts from the schema — caught before deployment.
"""
if example.get("deprecated"):
pytest.skip(f"Example {filename} is deprecated — skipping schema check")
if example.get("output_type") != "create_order":
pytest.skip(f"Example {filename} is not a create_order example")
try:
output_data = json.loads(example["output"])
except json.JSONDecodeError as e:
pytest.fail(f"Example {filename}: output is not valid JSON: {e}")
try:
CreateOrderOutput(**output_data)
except ValidationError as e:
pytest.fail(
f"Example {filename} output does not match current schema (v3):\n"
f"{e}\n\n"
f"Example output:\n{json.dumps(output_data, indent=2)}\n\n"
f"This example needs to be updated to the current schema."
)
@pytest.mark.parametrize("filename,example", load_all_examples())
def test_example_has_required_metadata(filename: str, example: dict):
"""Every example must have schema_version and created_at"""
assert "schema_version" in example, f"{filename}: missing schema_version"
assert "created_at" in example, f"{filename}: missing created_at"
assert "description" in example, f"{filename}: missing description"
assert not example.get("deprecated", False) or "deprecation_reason" in example, (
f"{filename}: deprecated example missing deprecation_reason"
)
# Run in CI with: pytest prompts/tests/test_examples.py -v
# Blocks deployment if any non-deprecated example fails schema validation
Option 4: Auto-generate examples from real successful outputs
import anthropic
import json
import hashlib
from datetime import datetime
from pathlib import Path
client = anthropic.Anthropic()
class ExampleHarvester:
"""
Harvest few-shot examples from real, successful agent outputs.
When the agent produces a verified-correct output, save it as a new example.
Examples are always current because they come from the running system.
"""
def __init__(
self,
examples_dir: str = "prompts/examples/",
max_examples_per_type: int = 5
):
self.examples_dir = Path(examples_dir)
self.examples_dir.mkdir(parents=True, exist_ok=True)
def _example_id(self, input_text: str, output_type: str) -> str:
return hashlib.sha256(f"{output_type}:{input_text}".encode()).hexdigest()[:12]
def harvest(
self,
input_text: str,
output_text: str,
output_type: str,
description: str,
schema_version: str,
verified: bool = False
):
"""
Save a real output as a new few-shot example.
Only call this when the output is verified correct.
"""
if not verified:
return # Never harvest unverified outputs as examples
example_id = self._example_id(input_text, output_type)
example_path = self.examples_dir / f"{output_type}_{example_id}.json"
example = {
"id": example_id,
"output_type": output_type,
"description": description,
"input": input_text,
"output": output_text,
"schema_version": schema_version,
"created_at": datetime.utcnow().isoformat(),
"validated_at": datetime.utcnow().isoformat(),
"source": "harvested_from_production",
"deprecated": False
}
example_path.write_text(json.dumps(example, indent=2))
print(f"Harvested example: {example_path.name}")
# Rotate old examples — keep only max_examples_per_type
self._rotate_examples(output_type)
def _rotate_examples(self, output_type: str):
"""Keep only the N most recent examples per type"""
type_examples = sorted(
self.examples_dir.glob(f"{output_type}_*.json"),
key=lambda p: p.stat().st_mtime,
reverse=True
)
for old in type_examples[5:]: # Keep 5 most recent
old.unlink()
print(f"Rotated old example: {old.name}")
harvester = ExampleHarvester()
async def run_agent_with_example_harvesting(
user_input: str,
output_type: str,
schema_version: str
) -> str:
"""
Run the agent and harvest verified outputs as future few-shot examples.
"""
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": user_input}]
)
output = response.content[0].text
# Verify output is correct before harvesting
is_valid = await verify_output(output, output_type, schema_version)
if is_valid:
harvester.harvest(
input_text=user_input,
output_text=output,
output_type=output_type,
description=f"Auto-harvested {output_type} example",
schema_version=schema_version,
verified=True # Only harvest verified outputs
)
return output
async def verify_output(output: str, output_type: str, schema_version: str) -> bool:
"""Verify output against current schema — only valid outputs become examples"""
try:
data = json.loads(output)
# Apply type-specific validation here
return True
except Exception:
return False
Option 5: Prompt diffing — detect example drift on schema changes
import json
import re
from dataclasses import dataclass
from typing import Optional
@dataclass
class DriftReport:
example_id: str
issues: list[str]
severity: str # "critical", "warning", "ok"
class ExampleDriftDetector:
"""
Detects when few-shot examples reference outdated patterns:
- Deprecated field names
- Old API endpoints
- Old date formats
- Removed enum values
"""
def __init__(self):
# Define what patterns are outdated
self.deprecated_patterns = {
# (regex pattern, replacement, description)
r'"qty"': ('"quantity"', "Field 'qty' renamed to 'quantity' in v3"),
r'"user_name"': ('"username"', "Field 'user_name' renamed to 'username'"),
r'/api/v1/': ('"description": "Use /api/v3/"', "API v1 endpoint deprecated"),
r'"MM/DD/YYYY"': ('"YYYY-MM-DD"', "Date format changed to ISO 8601"),
r'requests\.get': ('httpx.get', "Library migrated from requests to httpx"),
r'"status": "ok"': ('"status": "success"', "Status value changed from 'ok' to 'success'"),
}
self.required_patterns = {
# Patterns that MUST appear in current examples
'"version": "v3"': "All outputs must include version field set to v3",
}
def check_example(self, example: dict) -> DriftReport:
"""Check a single example for drift against current patterns"""
issues = []
example_text = json.dumps(example)
# Check for deprecated patterns
for pattern, (replacement, description) in self.deprecated_patterns.items():
if re.search(pattern, example_text):
issues.append(f"DEPRECATED: {description} — replace {pattern!r} with {replacement!r}")
# Check for required patterns in output
output_text = example.get("output", "")
for pattern, description in self.required_patterns.items():
if pattern not in output_text:
issues.append(f"MISSING: {description}")
severity = "critical" if any("DEPRECATED" in i for i in issues) else (
"warning" if issues else "ok"
)
return DriftReport(
example_id=example.get("id", "unknown"),
issues=issues,
severity=severity
)
def audit_all_examples(self, examples: list[dict]) -> list[DriftReport]:
"""Audit all examples and return drift reports"""
reports = []
for ex in examples:
if not ex.get("deprecated"):
report = self.check_example(ex)
reports.append(report)
if report.severity != "ok":
print(
f"[{report.severity.upper()}] Example '{report.example_id}':\n"
+ "\n".join(f" - {issue}" for issue in report.issues)
)
critical = sum(1 for r in reports if r.severity == "critical")
warnings = sum(1 for r in reports if r.severity == "warning")
ok = sum(1 for r in reports if r.severity == "ok")
print(f"\nAudit summary: {ok} ok, {warnings} warnings, {critical} critical")
return reports
detector = ExampleDriftDetector()
# Add new deprecated patterns when schema changes:
# detector.deprecated_patterns[r'"old_field"'] = ('"new_field"', "Renamed in schema v4")
Option 6: Few-shot example rotation schedule — time-based freshness enforcement
import json
from datetime import datetime, timedelta
from pathlib import Path
from typing import Optional
class ExampleFreshnessEnforcer:
"""
Enforces a rotation schedule for few-shot examples.
Examples older than max_age_days are flagged for review.
After review_deadline_days, old examples are automatically disabled.
The agent refuses to use examples past their expiry.
"""
def __init__(
self,
review_after_days: int = 60, # Flag for review after 60 days
disable_after_days: int = 180 # Disable after 180 days if not reviewed
):
self.review_after_days = review_after_days
self.disable_after_days = disable_after_days
def check_freshness(self, example: dict) -> dict:
"""
Returns freshness status for a single example.
"""
created_at_str = example.get("validated_at") or example.get("created_at")
if not created_at_str:
return {
"status": "unknown",
"message": "No creation/validation date — assume stale",
"usable": False
}
created_at = datetime.fromisoformat(created_at_str.replace("Z", "+00:00").rstrip("+00:00"))
age_days = (datetime.utcnow() - created_at).days
if age_days > self.disable_after_days:
return {
"status": "expired",
"message": f"Example is {age_days} days old — exceeds {self.disable_after_days}-day limit. Update required.",
"age_days": age_days,
"usable": False
}
elif age_days > self.review_after_days:
return {
"status": "stale",
"message": f"Example is {age_days} days old — review recommended.",
"age_days": age_days,
"usable": True # Still usable but flagged
}
else:
return {
"status": "fresh",
"age_days": age_days,
"usable": True
}
def filter_usable(self, examples: list[dict]) -> list[dict]:
"""Return only examples that are usable (not expired)"""
usable = []
for ex in examples:
freshness = self.check_freshness(ex)
if freshness["usable"]:
if freshness["status"] == "stale":
print(f"WARNING: Example '{ex.get('id')}' is stale ({freshness['message']})")
usable.append(ex)
else:
print(f"BLOCKED: Example '{ex.get('id')}' is expired and will not be used. {freshness['message']}")
return usable
def generate_rotation_schedule(self, examples: list[dict]) -> dict:
"""
Generate a schedule of when examples need review/replacement.
Use this in a weekly cron job to proactively manage examples.
"""
schedule = {"needs_review_now": [], "needs_review_soon": [], "fresh": []}
for ex in examples:
freshness = self.check_freshness(ex)
days = freshness.get("age_days", 999)
if not freshness["usable"] or days > self.review_after_days:
schedule["needs_review_now"].append({
"id": ex.get("id"),
"age_days": days,
"action": "Update immediately" if not freshness["usable"] else "Review and re-validate"
})
elif days > (self.review_after_days * 0.8): # 80% of review threshold
schedule["needs_review_soon"].append({
"id": ex.get("id"),
"age_days": days,
"action": f"Schedule review within {self.review_after_days - days} days"
})
else:
schedule["fresh"].append(ex.get("id"))
return schedule
enforcer = ExampleFreshnessEnforcer(review_after_days=60, disable_after_days=180)
# In agent startup:
def load_valid_examples(library: FewShotLibrary) -> list[dict]:
all_examples = [e.__dict__ for e in library.get_current_examples(max_examples=10)]
return enforcer.filter_usable(all_examples)
# Weekly cron job to check schedule:
# schedule = enforcer.generate_rotation_schedule(all_examples)
# if schedule["needs_review_now"]:
# send_alert(f"Few-shot examples need immediate review: {schedule['needs_review_now']}")
Example Staleness Failure Modes
| Failure Mode | Symptom | Fix |
|---|---|---|
| Renamed field in output | Agent uses old field name qty instead of quantity |
Version-stamp examples; fail CI on deprecated pattern |
| Old API endpoint in example | Agent generates calls to /api/v1/ |
Add deprecated pattern check for old endpoints |
| Changed date format | Agent formats 12/31/2024 instead of 2024-12-31 |
CI schema validation against current format |
| Library migration | Agent writes requests.get() after migration to httpx |
Drift detector with deprecated code patterns |
| New required field added | Agent omits version field added to schema |
Required pattern checker in CI |
| Old error format | Agent produces old error structure downstream rejects | Schema validation test on all examples |
When to Update Examples
| Event | Action |
|---|---|
| API schema version bump | Deprecate all examples for old version; add new examples |
| Field renamed | Add old name to deprecated patterns; add new examples |
| Library/framework upgrade | Audit examples for old import patterns; rotate |
| Output format change | Run CI schema validation; fix or remove failing examples |
| New required field | Add to required_patterns check; update all examples |
| After 60 days | Re-validate examples still match current behavior |
Expected Token Savings
Stale examples teach wrong patterns → model produces wrong output → user corrects → retry: ~8,000 tokens per session Current examples → model produces correct output first try: 0 correction overhead
Environment
- Any agent using few-shot prompting with static examples in the system prompt or message template; especially critical for agents that produce structured output (JSON, SQL, code) where schemas and conventions evolve over time
- Source: direct experience; few-shot example staleness is the hardest prompt bug to diagnose because the model’s reasoning looks correct — it’s faithfully following wrong examples
Wasting tokens on this error?
Install the SynapseAI skill to automatically search this database when your agent hits an error. Average savings: $2–5 per error incident.
clawhub install synapse-ai
Solved an error that's not here?
Share it and earn MoltCoin rewards.