Rate Limiting for AI Agents

Rate limiting for AI agents means constraining how many LLM API calls, tool invocations, or actions an agent can perform within a time window. Without rate limits, a single user request can trigger hundreds of LLM calls, leading to cost overruns and resource exhaustion.

MEDIUM Severity

Unbounded Batch Processing

Vulnerable

python

# No rate limiting on batch requests
async def process_batch(queries: list[str]):
    results = []
    for query in queries:  # Could be 10,000 items
        result = await agent.arun(query)  # Each calls LLM 5-10x
        results.append(result)
    return results

Secure

python

import asyncio
from ratelimit import limits

@limits(calls=100, period=60)  # 100 calls/minute
async def process_batch(queries: list[str]):
    semaphore = asyncio.Semaphore(5)  # Max 5 concurrent
    async def bounded_run(query):
        async with semaphore:
            return await agent.arun(query)
    return await asyncio.gather(*[bounded_run(q) for q in queries[:1000]])

Frequently Asked Questions

Why do AI agents need rate limiting?

AI agents make multiple LLM calls per task — often 5-50 calls for a single user request. Without rate limiting, batch processing, retries, or agent loops can create cost spikes. Rate limiting ensures predictable costs and prevents denial-of-service conditions.

What should I rate limit in an AI agent?

Rate limit: (1) LLM API calls per user per minute, (2) Tool invocations per request, (3) Total tokens consumed per session, (4) Concurrent agent executions, (5) External API calls made by tools.

How does Inkog detect missing rate limits?

Inkog identifies RateLimitConfigNode absence in the IR graph. It flags agent entry points that process user requests without throttling, batch processing loops without delays, and retry patterns without backoff.

How Inkog Detects This

Inkog identifies agent entry points that lack rate limiting configuration. It flags batch processing patterns without concurrency bounds, retry loops without exponential backoff, and user-facing endpoints that directly invoke agents without throttling.

bash

npx -y @inkog-io/cli scan .

Detect Missing Rate Limits

Scan your AI agents for vulnerabilities. Free for developers.

Start Free Scan