Context Window Exhaustion in AI Agents
Context window exhaustion occurs when an AI agent accumulates conversation history, tool results, or intermediate outputs that exceed the LLM's context window limit. The agent either crashes with a token limit error, silently truncates important context, or enters a degraded state where it loses track of earlier instructions.
Unbounded Conversation History
class Agent:
def __init__(self):
self.history = [] # Grows without bound
def chat(self, message: str) -> str:
self.history.append({"role": "user", "content": message})
# Eventually exceeds context window
response = llm.chat(self.history)
self.history.append({"role": "assistant", "content": response})
return responseclass Agent:
MAX_HISTORY = 20 # Keep last 20 messages
def __init__(self):
self.history = []
self.system_prompt = {"role": "system", "content": "..."}
def chat(self, message: str) -> str:
self.history.append({"role": "user", "content": message})
# Trim history but always keep system prompt
messages = [self.system_prompt] + self.history[-self.MAX_HISTORY:]
response = llm.chat(messages)
self.history.append({"role": "assistant", "content": response})
return responseFrequently Asked Questions
What is context window exhaustion in AI agents?
Context window exhaustion happens when the total tokens in an agent's conversation history exceeds the LLM's context limit (4K-128K tokens depending on model). The agent either fails with an error or silently drops older messages, potentially losing critical system instructions.
How do you prevent context window exhaustion?
Implement conversation memory strategies: sliding window (keep last N messages), summarization (compress older history), token counting with trim (remove oldest when approaching limit), or use models with larger context windows while still setting bounds.
What happens when an AI agent hits the context window limit?
Behavior depends on implementation: the API may return an error, the framework may silently truncate from the beginning (losing system prompts), or the agent may hallucinate as it loses context. All outcomes are problematic in production.
How Inkog Detects This
Inkog identifies MemoryAccessNodes where conversation history is appended without bounds checking. It flags patterns where lists grow in loops without truncation and where no max token counting is applied before LLM calls.
npx -y @inkog-io/cli scan .