Context Window Exhaustion in AI Agents

Context window exhaustion occurs when an AI agent accumulates conversation history, tool results, or intermediate outputs that exceed the LLM's context window limit. The agent either crashes with a token limit error, silently truncates important context, or enters a degraded state where it loses track of earlier instructions.

HIGH Severity

Unbounded Conversation History

Vulnerable
python
class Agent:
    def __init__(self):
        self.history = []  # Grows without bound

    def chat(self, message: str) -> str:
        self.history.append({"role": "user", "content": message})
        # Eventually exceeds context window
        response = llm.chat(self.history)
        self.history.append({"role": "assistant", "content": response})
        return response
Secure
python
class Agent:
    MAX_HISTORY = 20  # Keep last 20 messages

    def __init__(self):
        self.history = []
        self.system_prompt = {"role": "system", "content": "..."}

    def chat(self, message: str) -> str:
        self.history.append({"role": "user", "content": message})
        # Trim history but always keep system prompt
        messages = [self.system_prompt] + self.history[-self.MAX_HISTORY:]
        response = llm.chat(messages)
        self.history.append({"role": "assistant", "content": response})
        return response

Frequently Asked Questions

What is context window exhaustion in AI agents?

Context window exhaustion happens when the total tokens in an agent's conversation history exceeds the LLM's context limit (4K-128K tokens depending on model). The agent either fails with an error or silently drops older messages, potentially losing critical system instructions.

How do you prevent context window exhaustion?

Implement conversation memory strategies: sliding window (keep last N messages), summarization (compress older history), token counting with trim (remove oldest when approaching limit), or use models with larger context windows while still setting bounds.

What happens when an AI agent hits the context window limit?

Behavior depends on implementation: the API may return an error, the framework may silently truncate from the beginning (losing system prompts), or the agent may hallucinate as it loses context. All outcomes are problematic in production.

How Inkog Detects This

Inkog identifies MemoryAccessNodes where conversation history is appended without bounds checking. It flags patterns where lists grow in loops without truncation and where no max token counting is applied before LLM calls.

bash
npx -y @inkog-io/cli scan .

Detect Memory Issues

Scan your AI agents for vulnerabilities. Free for developers.

Start Free Scan