The AI Agent Security Gap: Findings from Scanning 500+ Open-Source AI Agent Projects
We scanned 500+ open-source AI agent repositories for security vulnerabilities. The results reveal a systemic gap between AI agent adoption and AI agent security.
We scanned over 500 open-source AI agent repositories for security vulnerabilities. The results reveal a systemic gap between AI agent adoption and AI agent security — one that's widening as regulatory deadlines approach.
This is the largest automated security analysis of the AI agent ecosystem ever published. Every data point comes from scanning real code with Inkog's static analysis engine, not surveys or self-reported assessments.
The full report is available at inkog.io/report.
The Headline Numbers
Across 500+ repositories spanning every major AI agent framework — LangChain, CrewAI, AutoGen, pydantic-ai, LangGraph, MCP servers, and dozens more — we found:
- 86% of repos contained at least one security finding
- 63% had CRITICAL or HIGH severity vulnerabilities
- 11,705 total findings across all repositories — an average of 20.9 per repo
- The most common issue: infinite loops (5,397 instances) — unbounded agent execution cycles with no termination guard
- The average governance score was 46 out of 100
These aren't edge cases or obscure frameworks. These are the repos that developers are cloning, forking, and building production agents on top of.
What We Found
The Top Vulnerability Types
The most frequently detected patterns tell a clear story about what AI agent developers are getting wrong:
1. Resource exhaustion (infinite loops, token bombing, missing rate limits). By far the dominant category. Most AI agents implement loop-based execution — an agent iterates until it "solves" the task. But without termination guards, a confused agent can loop indefinitely, consuming compute, burning API tokens, and potentially crashing production systems. We found infinite loop vulnerabilities in repos ranging from 50 to 150,000+ GitHub stars.
2. Missing governance controls (audit logging, human oversight, rate limits). The second-largest category. Most AI agents ship with zero governance infrastructure. No logging of what the agent did, no approval gates before critical operations, no rate limiting on tool invocations. This isn't just a security problem — it's a compliance problem. EU AI Act Article 12 requires automatic logging. Article 14 requires human oversight. The vast majority of repos fail both.
3. Unsafe code execution. AI agents that take LLM-generated text and pass it to exec(), eval(), or shell commands without sanitization. This is remote code execution by design — the LLM can generate arbitrary code, and the agent executes it. We found CRITICAL-severity exec/eval vulnerabilities in frameworks with 25,000+ GitHub stars.
4. Injection vectors. User input flowing through tool chains to dangerous sinks — SQL queries, system commands, external APIs — without validation or sanitization. When an LLM output is used to construct a database query, that's SQL injection with extra steps.
Framework Comparison
Security posture varies significantly by framework. Some frameworks have secure defaults — built-in input validation, mandatory approval gates, rate limiting as a first-class feature. Others prioritize developer ergonomics and leave security to the user.
We found TypeScript and pydantic-ai projects most represented in our scan. Notably, LlamaIndex and GitHub's official MCP server achieved zero findings with perfect governance scores — demonstrating that security and developer experience are not mutually exclusive.
The full report includes a framework-by-framework comparison table.
MCP Server Findings
MCP (Model Context Protocol) servers represent a unique and growing attack surface. With MCP adoption reaching 97 million monthly SDK downloads and 5,800+ active servers, security matters.
We scanned 19 MCP server repositories. 84% had security findings. The most common issues: infinite loops in tool handlers, missing rate limits, and missing audit logging. One Chrome automation MCP server had 97 findings. The official modelcontextprotocol/servers reference implementation had 3 governance findings.
This matters because a compromised MCP server compromises every agent that connects to it. MCP servers often run un-sandboxed on developer machines with full file system and network access. Adversa AI's research identified prompt injection and command injection as the two highest-severity MCP risks.
The Compliance Problem
EU AI Act Article 14
Article 14 of the EU AI Act requires human oversight for high-risk AI systems. For AI agents, that means:
- Approval gates before critical operations
- Kill switches for autonomous workflows
- Audit trails of agent actions
- The ability to override agent decisions
Our data shows 25% of repos fail Article 14 requirements entirely. With full enforcement of high-risk AI system obligations beginning August 2, 2026 — four months from now — the gap between where the ecosystem is and where it needs to be is enormous.
The penalties are not theoretical: up to €35 million or 7% of global annual turnover for prohibited AI practices, and €15 million or 3% for high-risk system violations. These exceed GDPR maximums.
Governance Scores
We computed a governance score (0-100) for every repository, measuring human oversight gates, authorization checks, rate limiting, and audit logging.
The average score was 46 out of 100. This means the typical AI agent repository has, at best, partial governance controls. Many scored below 20 — functioning agent code with zero governance infrastructure.
Why This Matters Now
Three trends are converging:
1. Adoption is accelerating. The AI agent market reached ~$5.25 billion in 2024 and is projected past $10 billion by 2026. 57% of enterprises have deployed agents to production. 80% of Fortune 500 companies are building agents. LangChain has 122,000+ GitHub stars. MCP is growing exponentially. This isn't experimental anymore.
2. Regulation is arriving. The EU AI Act is not theoretical — provisions are already in force. NIST launched the AI Agent Standards Initiative in February 2026. OWASP published the Agentic Top 10. Singapore released the first national framework for agentic AI governance. The compliance clock is ticking.
3. Real incidents are happening. Microsoft 365 Copilot had a zero-click prompt injection (CVE-2025-32711, CVSS 9.3). Meta had a Severity-1 data exposure from a rogue AI agent. The Drift chatbot breach compromised 700+ organizations through stolen OAuth tokens. A Chinese state-sponsored group used AI coding agents for autonomous network intrusion. CrewAI had four chained CVEs enabling remote code execution.
The result: a rapidly growing attack surface, active exploitation in the wild, regulation with enforcement power — and security tooling that hasn't caught up.
What To Do About It
For Developers
- Scan before you ship. Run static analysis on agent code as part of CI/CD. Inkog supports GitHub Actions, SARIF output, and all major frameworks.
- Add human oversight gates. Critical operations need approval flows. Don't let agents make irreversible decisions autonomously.
- Validate tool inputs. Never pass raw LLM output to tools without sanitization. Use parameterized queries for databases.
- Rate limit everything. Infinite loops and token bombing are the #1 finding — set maximum iterations and token budgets.
For Security Teams
- Include agents in AppSec programs. AI agents are software. Apply the same rigor you apply to web applications.
- Audit MCP server integrations. Every tool an agent can invoke is a potential entry point.
- Map to compliance frameworks. EU AI Act, NIST, OWASP provide actionable structure.
For CISOs
- Audit your agent inventory. Know what agents are deployed and what they can access. Many organizations have "shadow agents" built without security review.
- Budget for agent security. The average organization allocates only ~6% of security budget to AI agents — insufficient for a rapidly growing attack surface.
- Align to regulations now. EU AI Act compliance requires engineering changes that take months to implement. Start now.
Get the Full Report
The complete report includes:
- Detailed vulnerability data with top 10 patterns and counts
- Framework-by-framework comparison table
- EU AI Act readiness distribution and article-by-article failure rates
- Anonymized case studies from repos with 25K+ stars
- MCP server security analysis
- Full methodology and limitations
About the Methodology
500+ repos were selected via 40 GitHub search queries targeting AI agent frameworks, supplemented by a curated list of 55+ known AI agent repos to ensure coverage of all major frameworks. Minimum 20 stars, no forks. Each repo was scanned with Inkog v1.1.0 using the comprehensive policy (all detectors, no confidence filtering).
Inkog's Universal IR engine converts any framework (Python, TypeScript, YAML) to a framework-agnostic intermediate representation. Detection rules query this IR — they don't need framework-specific logic. This is what makes scanning 500+ repos across 11+ framework adapters possible.
For full methodology details, including limitations and analysis tier descriptions, see Section 3 of the full report.