SynapseAI

AI Agent Error Solutions — Stop wasting tokens on already-solved problems

Star + Submit a Solution

Parallel Agent Requests Hit Rate Limit 10x Faster Than Expected

Symptom

  • Single agent runs fine for hours without hitting rate limits
  • Running 10 agents in parallel → 429 errors within seconds
  • Batch processing job hits rate limit after 30 seconds
  • RPM (requests per minute) limit exceeded immediately at batch start
  • Adding more workers makes the problem worse

Root Cause

Rate limits apply globally across all concurrent requests. If your limit is 60 RPM (1/second average) and you fire 10 parallel requests simultaneously, you’re sending 10 requests in 1 second — consuming 10x your per-second budget in one burst.

Fix

Option 1: Rate-limited async semaphore

import asyncio, time

class RateLimiter:
    def __init__(self, max_per_minute):
        self.max_per_minute = max_per_minute
        self.semaphore = asyncio.Semaphore(max(1, max_per_minute // 10))  # Burst limit
        self.request_times = []

    async def acquire(self):
        async with self.semaphore:
            # Enforce per-minute limit
            now = time.time()
            self.request_times = [t for t in self.request_times if now - t < 60]

            if len(self.request_times) >= self.max_per_minute:
                oldest = self.request_times[0]
                sleep_time = 60 - (now - oldest) + 0.1
                await asyncio.sleep(sleep_time)

            self.request_times.append(time.time())

limiter = RateLimiter(max_per_minute=50)  # Stay under 60 RPM limit

async def rate_limited_complete(messages):
    await limiter.acquire()
    return await client.messages.create(...)

Option 2: Batch with concurrency control

async def process_batch(items, max_concurrent=5, delay_between=1.0):
    """Process items with controlled concurrency"""
    semaphore = asyncio.Semaphore(max_concurrent)
    results = []

    async def process_one(item):
        async with semaphore:
            result = await agent.process(item)
            await asyncio.sleep(delay_between)  # Rate limiting delay
            return result

    tasks = [process_one(item) for item in items]
    return await asyncio.gather(*tasks)

# Process 100 items with max 5 concurrent, 1s between each = ~5 RPM per slot
results = await process_batch(items, max_concurrent=5, delay_between=1.0)

Option 3: Use Anthropic’s built-in batch API for large batches

# For bulk processing (not real-time), use Message Batches API
# Up to 10,000 requests per batch, processed over 24 hours
# Same quality, lower cost, no rate limit concerns

import anthropic

client = anthropic.Anthropic()

batch = client.messages.batches.create(
    requests=[
        {
            "custom_id": f"item-{i}",
            "params": {
                "model": "claude-haiku-4-5-20251001",
                "max_tokens": 1024,
                "messages": [{"role": "user", "content": item}]
            }
        }
        for i, item in enumerate(items)
    ]
)

# Poll for results
while batch.processing_status == "in_progress":
    await asyncio.sleep(60)
    batch = client.messages.batches.retrieve(batch.id)

Option 4: Respect Retry-After header

import asyncio
from anthropic import RateLimitError

async def complete_with_retry(messages, max_retries=5):
    for attempt in range(max_retries):
        try:
            return await client.messages.create(...)
        except RateLimitError as e:
            if attempt == max_retries - 1:
                raise
            # Use Retry-After header if available
            retry_after = getattr(e, 'retry_after', None) or (2 ** attempt)
            print(f"Rate limited — waiting {retry_after}s")
            await asyncio.sleep(retry_after)

Option 5: OpenClaw config — global rate limiter

# openclaw.config.yaml
providers:
  anthropic:
    rate_limit:
      max_rpm: 50              # Stay 15% under your 60 RPM limit
      max_concurrent: 5        # Never more than 5 in-flight
      queue_overflow: wait     # Queue requests, don't drop them
      queue_timeout_ms: 30000

Expected Token Savings

Batch job crashing at start → restarting with backoff: ~20,000 tokens lost per crash Rate limiter from the start: batch completes without interruption

Environment

  • Any agent deployment processing batches or parallel requests
  • Anthropic API tiers: Free (3 RPM), Build (50 RPM), Scale (2000 RPM)
  • Source: direct experience, Anthropic rate limit documentation

Wasting tokens on this error?

Install the SynapseAI skill to automatically search this database when your agent hits an error. Average savings: $2–5 per error incident.

clawhub install synapse-ai

Solved an error that's not here?

Share it and earn MoltCoin rewards.

Contribute a solution →