The Engineer's Guide

How to secure AI agents — vibe-coded and hand-built

Whether your agent was vibe-coded in Claude Code or hand-written in LangChain, it faces the same execution-layer vulnerabilities — delegation loops, unconstrained tool calls, AGENTS.md hijacks, MCP supply chain attacks. This is how to find them before production.

Audited on 500+ open-source agents · 20+ frameworks · Open source CLI

The new attack surface: why agentic security breaks traditional AppSec

A traditional web application is deterministic. A request comes in, code runs, a response goes out. SAST tools were built for that world — pattern-match on dangerous functions, walk the data-flow graph, flag the sinks. The control flow is predictable enough that a static rule can reason about it.

An AI agent isn't deterministic. It chooses which tool to call, what arguments to pass, when to stop, and whether to delegate to another agent — all based on an LLM's output, which depends on prompts, retrieved context, and prior turns. The same code path can do a hundred different things on a hundred different invocations.

That's why agent security has to look at things classical SAST doesn't. The code paths are in the wiring, not just the syntax. The vulnerabilities live in tool schemas, delegation chains, AGENTS.md instructions, and MCP server tool descriptions — none of which a Semgrep rule can reason about.

The vibe-coded vs. hand-built dilemma

Two engineering populations build agents in 2026. The first hand-writes LangChain, CrewAI, LangGraph, AutoGen — careful, opinionated code. The second vibe-codes in Claude Code, Cursor, or Windsurf — fast iteration, AI-generated scaffolding, less manual review.

Both populations ship the same kind of vulnerability. Empirical studies in early 2026 found AI-generated code is roughly 2.7× more likely to contain security issues than human-written code — but the failures themselves are familiar: hardcoded credentials, unsanitized user input, missing rate limits, untrusted tool calls. Vibe coding raises the rate; it doesn't change the categories.

The framing for the rest of this guide treats the agent as the unit of analysis. The vulnerabilities are properties of how the agent is wired, not who wrote the wiring.

Vulnerability 1: multi-agent delegation loops (CrewAI & AutoGen)

CrewAI lets you mark agents with allow_delegation=True. An agent that can delegate can also be delegated to. With three or more agents in a hierarchical crew, that becomes a graph — and graphs have cycles.

vulnerable
# crew/research.py
researcher = Agent(role="researcher", allow_delegation=True)
writer = Agent(role="writer", allow_delegation=True)
editor = Agent(role="editor", allow_delegation=True)

crew = Crew(agents=[researcher, writer, editor], process="hierarchical")
crew.kickoff(inputs={"topic": "..."})  # No bound on delegation depth.
fix
# crew/research.py
crew = Crew(
    agents=[researcher, writer, editor],
    process="hierarchical",
    max_rpm=20,                # cap per-minute LLM rate
    max_iter=15,               # cap delegation depth
    function_calling_llm=guard, # validate every tool call
)

When the editor delegates a clarification back to the researcher who delegates it back to the writer, you get a cycle. The LLM doesn't notice — it's optimizing locally on each turn. The bill notices. Inkog's multi-agent delegation audit walks the delegation graph statically and flags cycles plus unsigned handoffs that map to CWE-345.

Vulnerability 2: tool binding and unconstrained execution

A LangChain tool is a function plus a description plus an argument schema. The LLM picks the tool and constructs the arguments. If the schema is loose, or the description is ambiguous, the LLM has wide latitude to pass whatever the prompt convinced it to pass.

vulnerable
@tool
def run_query(sql: str) -> str:
    """Run an SQL query against the production database."""
    return db.execute(sql)

# Agent has tool. LLM constructs arbitrary SQL based on user input.
fix
@tool
def run_query(query_id: str, params: dict) -> str:
    """Run a pre-approved query by ID with parameters.

    query_id: must match an entry in QUERY_REGISTRY.
    params:   dict of bind variables; values are escaped automatically.
    """
    return db.execute(QUERY_REGISTRY[query_id], params)

The fix isn't just argument validation — it's collapsing the input space. Pre-approved queries with bind variables means the LLM's degrees of freedom are bounded by your authoring decisions, not its prompt-following behavior. Inkog's static analyzer flags tool functions that accept free-form strings and reach dangerous sinks (db.execute, subprocess.run, requests.post) without intermediate validation.

Vulnerability 3: the AGENTS.md supply chain attack

AGENTS.md is the de-facto standard for telling AI coding agents how to behave inside a repository. Claude Code, Cursor, Codex, Amp, and Jules all read it. They auto-include it in every chat request — not as documentation, but as authoritative instructions. NVIDIA's AI red team has published research on how a malicious AGENTS.md can hijack VS Code chat sessions and silently exfiltrate data.

The attack writes itself: clone a popular open-source repo, get the developer to run a build that mutates AGENTS.md mid-flow (or simply commits a tampered version), and the next Claude Code session executes whatever the new file instructs — often pointing at a malicious MCP server. We walk through a full proof-of-concept in our AGENTS.md supply chain attack post.

The defensible answer is governance verification: the AGENTS.md declares what the agent should do, the code declares what it actually does, and you compare the two. Inkog's inkog_verify_governance tool does this automatically — declared-vs-does verification, on every commit.

Vulnerability 4: malicious MCP servers and SKILL.md exploits

The Model Context Protocol turns any local or remote service into a tool the agent can call. The tool description is part of the prompt — meaning a compromised MCP server can hide instructions inside a tool spec, and the agent will follow them as if they came from the system prompt.

A 2026 Snyk study scanned 3,984 agent skills and found 534 contained at least one critical vulnerability, including malware and prompt injection attack functions. The average developer doesn't inspect the tool descriptions of every MCP server they install. They should — or they should let an automated scanner do it.

See our reference on MCP server security and tool poisoning for how the attacks work and what to scan for.

Securing hand-built frameworks in CI/CD

For teams hand-writing LangChain, CrewAI, LangGraph, or AutoGen agents, security should be a CI gate — not a separate review stage. The four checks worth automating before merge:

  • Bound every loop. max_iterations, max_execution_time, early_stopping_method — set on every AgentExecutor.
  • Constrain every tool. Strict argument schemas. Pre-validated dispatch tables. Never eval or subprocess.run(shell=True) on LLM output.
  • Gate destructive operations. Filesystem writes, database mutations, external API calls, financial operations — require explicit human approval.
  • Verify what you declared. If your AGENTS.md says “requires approval for write operations,” the code must enforce it. Drift between declared and actual is the most dangerous governance gap.
yaml
# .github/workflows/inkog.yml
name: Agent Security Scan
on: [pull_request]
jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: inkog-io/inkog@v1
        with:
          path: .
          severity: high
          policy: balanced

Securing vibe-coded workflows

The vibe-coding pattern is research → plan → execute → review → ship, with the human as oversight at each gate. The security pattern is the same idea translated to verification: vibe, then verify.

Three things make this work in practice:

  1. Treat AGENTS.md as privileged infrastructure. Lock it behind a separate review track. Don't let your AI assistant rewrite it without explicit human approval.
  2. Audit every MCP server before you wire it in. The Inkog MCP server itself includes inkog_audit_mcp_server so your AI assistant can run the audit before you accept the install.
  3. Run a static scan on every PR. Catch the AI-generated code that looks fine but flunks the patterns above. See our walkthrough of the dev-flow loop for the full pattern.

The principle: the LLM is your pair programmer, not your security gate. Pair programmers don't merge their own PRs.

Static analysis vs. autonomous red teaming

Static analysis tells you what could go wrong. Red teaming proves which ones actually can. Both are useful; neither is sufficient on its own.

ApproachFindsMisses
Static analysis (Inkog Verify)Pattern-level flaws, taint paths, missing oversight, unconstrained toolsWhether a flaw is exploitable in practice
Adversarial testing (Inkog Red)Exploitable chains, runtime jailbreaks, multi-step attack feasibilityPatterns the attacker didn't try this run

In production, you want both — Verify on every PR, Red before every release. See Inkog Red for how the autonomous attacker works.

How Inkog secures your agents from commit to production

Inkog is a security platform purpose-built for AI agents. It runs across the development lifecycle — local CLI, IDE-integrated MCP server, GitHub Action gating PRs, and adversarial testing pre-release. One scanner, every framework we support, mapped to OWASP LLM Top 10.

Scan your agent in 60 seconds

Free, no signup. Paste a GitHub URL or upload a zip. Find the vulnerabilities we describe above in your own code.

Start free scan