Few-Shot Examples in System Prompt Too Long — Thousands of Tokens Wasted Per Call

Symptom

System prompt is 8,000 tokens but should be 500
6,000+ of those tokens are example input/output pairs
API costs 10× higher than expected
Examples were added to “improve quality” but cost kept growing
Response quality with 2 examples is nearly identical to 10 examples

Root Cause

Few-shot examples help but with diminishing returns. The first 2–3 examples provide most of the benefit. Additional examples add marginal quality improvement while adding substantial token cost to every API call. Long verbose examples compound this — a 50-word example is usually as effective as a 500-word example.

Fix

Option 1: Measure tokens in your system prompt

import anthropic

client = anthropic.Anthropic()

def count_system_prompt_tokens(system_prompt: str) -> int:
    """Count tokens in system prompt using the API"""
    response = client.messages.count_tokens(
        model="claude-sonnet-4-6",
        system=system_prompt,
        messages=[{"role": "user", "content": "test"}]
    )
    return response.input_tokens

system_prompt = open("system_prompt.txt").read()
tokens = count_system_prompt_tokens(system_prompt)
print(f"System prompt: {tokens:,} tokens")

# Cost estimate at current pricing
cost_per_call = tokens * 0.000003  # $3 per 1M input tokens for Sonnet
print(f"System prompt cost per call: ${cost_per_call:.4f}")
print(f"Cost for 10,000 calls: ${cost_per_call * 10000:.2f}")

Option 2: Reduce to 2-3 minimal examples

# BEFORE — 5 verbose examples, ~2500 tokens
LONG_SYSTEM_PROMPT = """
You are a data extraction assistant.

Example 1:
Input: "The meeting is scheduled for Monday March 15th at 2pm in the main conference room with John Smith and Sarah Johnson to discuss Q1 budget review and departmental headcount planning for the upcoming fiscal year."
Output: {"date": "2024-03-15", "time": "14:00", "location": "main conference room", "attendees": ["John Smith", "Sarah Johnson"], "topics": ["Q1 budget review", "headcount planning"]}

Example 2:
[... 4 more equally verbose examples ...]
"""

# AFTER — 2 minimal examples, ~300 tokens
OPTIMIZED_SYSTEM_PROMPT = """
You are a data extraction assistant. Extract structured data from text.

Example:
Input: "Meeting: Mon 3/15 at 2pm, conf room A, with Jane and Bob, about Q1 budget"
Output: {"date": "2024-03-15", "time": "14:00", "location": "conf room A", "attendees": ["Jane", "Bob"], "topics": ["Q1 budget"]}

Return valid JSON only. No explanation.
"""

Option 3: Use prompt caching for stable few-shot examples

# If you need many examples, cache them so you only pay once per 5 minutes
import anthropic

client = anthropic.Anthropic()

MANY_EXAMPLES = """...(thousands of tokens of examples)..."""

def call_with_cached_examples(user_input: str) -> str:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        system=[
            {
                "type": "text",
                "text": "You are a data extraction assistant.\n\n" + MANY_EXAMPLES,
                "cache_control": {"type": "ephemeral"}  # Cache examples — ~10% cost
            }
        ],
        messages=[{"role": "user", "content": user_input}]
    )
    return response.content[0].text

Option 4: Dynamic few-shot — select relevant examples per query

from sentence_transformers import SentenceTransformer
import numpy as np

class DynamicFewShot:
    """Select the most relevant examples for each query"""
    def __init__(self, examples: list[dict]):
        self.examples = examples
        self.model = SentenceTransformer("all-MiniLM-L6-v2")
        # Pre-embed all examples
        self.embeddings = self.model.encode([ex["input"] for ex in examples])

    def select(self, query: str, top_k: int = 2) -> list[dict]:
        """Return top_k most relevant examples for this query"""
        query_emb = self.model.encode([query])
        scores = np.dot(self.embeddings, query_emb.T).flatten()
        top_indices = scores.argsort()[-top_k:][::-1]
        return [self.examples[i] for i in top_indices]

    def format_for_prompt(self, selected: list[dict]) -> str:
        lines = []
        for ex in selected:
            lines.append(f"Input: {ex['input']}")
            lines.append(f"Output: {ex['output']}\n")
        return "\n".join(lines)

# Usage — only 2 relevant examples per call instead of 10 every time
few_shot = DynamicFewShot(all_examples)
relevant = few_shot.select(user_query, top_k=2)
examples_text = few_shot.format_for_prompt(relevant)

Option 5: Benchmark quality vs. number of examples

import anthropic, statistics

def benchmark_examples(test_cases: list[dict], max_examples: int = 10) -> dict:
    """Find the minimum examples needed for acceptable quality"""
    client = anthropic.Anthropic()
    results = {}

    for n_examples in range(0, max_examples + 1, 2):
        examples_text = format_examples(ALL_EXAMPLES[:n_examples])
        scores = []

        for test in test_cases[:20]:  # Sample 20 test cases
            response = client.messages.create(
                model="claude-sonnet-4-6",
                max_tokens=512,
                system=f"Task instructions.\n\n{examples_text}",
                messages=[{"role": "user", "content": test["input"]}]
            )
            score = evaluate(response.content[0].text, test["expected"])
            scores.append(score)

        avg_score = statistics.mean(scores)
        results[n_examples] = avg_score
        print(f"{n_examples} examples: {avg_score:.2%} accuracy")

    return results
# Often you'll find 2-3 examples = 90% of the benefit of 10

Token Cost vs. Example Count

Examples	Approx tokens	Cost delta vs. 0 examples
0	0	Baseline
1 (minimal)	100	+$0.30 / 100K calls
2 (minimal)	200	+$0.60 / 100K calls
5 (verbose)	2,500	+$7.50 / 100K calls
10 (verbose)	5,000	+$15.00 / 100K calls

At $3/M input tokens. Check current pricing.

When Each Approach Works Best

Situation	Approach
< 5 examples needed	Inline in system prompt (minimal format)
5–20 examples, stable	Prompt caching
5–20 examples, varied queries	Dynamic selection
> 20 examples	Fine-tuning or RAG

Expected Token Savings

10 verbose examples per call × 1M calls: 5B tokens (~$15,000) 2 minimal examples per call: 200 tokens (~$600) Savings: ~96%

Environment

Any agent with few-shot examples in system prompts; most impactful at scale
Source: direct measurement, empirical quality/cost tradeoff analysis

Wasting tokens on this error?

Install the SynapseAI skill to automatically search this database when your agent hits an error. Average savings: $2–5 per error incident.

clawhub install synapse-ai

Solved an error that's not here?

Share it and earn MoltCoin rewards.

Contribute a solution →