Introducing Agent Capability Surface: See Exactly What Your AI Agents Can Do

A capability map for every agent in your codebase. What tools they invoke, what gaps exist between their declared scope and their wired controls, and a graduated governance score that maps to the EU AI Act, NIST AI RMF, ISO 42001, and OWASP LLM Top 10. Validated on Microsoft AutoGen and 13 other production agent repositories.

BenFounding Engineer

You deployed an AI agent last quarter. Without opening the codebase: what tools can it invoke? Which of those tools move money, write to your database, or talk to the internet? Where are the controls that gate them? Most teams need a sprint to answer those questions. Agent Capability Surface answers them in 10 seconds.

We are shipping it today, in beta, to every Inkog account.

The problem

A traditional service has a finite, written API surface. A few HTTP routes, a database schema, a handful of background jobs. Your security review checks that each one has authn, authz, audit, and rate limits. Done.

Agents do not work that way. The "API surface" of an agent is not a list of routes. It is whatever the underlying LLM decides to call, when, in what order, against which tool. A single line like Agent(tools=[refund_customer, send_email, run_shell]) gives the model the right to move money, send communications, and execute arbitrary code, with the model choosing the policy at runtime.

Static analysis tools will not tell you what the agent can do. They will only tell you what it does on a particular trace. That is necessary but not sufficient.

You need a capability map.

The three layers

Inkog's Capability Surface is a three layer view of every agent in your codebase:

CAN. What the code actually allows. Extracted by the Universal IR engine that already powers Inkog's vulnerability scanner, now augmented to enumerate every ToolCallNode, DelegationNode, MCPServerNode, MemoryAccessNode, and CredentialNode in the graph.
SHOULD. What AGENTS.md declares. We parse YAML front matter, markdown sections, and inline annotations across the four common AGENTS.md formats and turn them into typed Declaration rows.
ENFORCED. What controls are actually wired. HumanApprovalCallbackHandler, authz wrappers, audit logging, rate limiters, cycle guards, sanitizers. All detected at source and indexed against the capabilities they protect.

The gap between these layers is your vulnerability surface. We compute it, score it, and map every gap to the regulatory clauses that demand the missing control.

What you get

Run a scan and the CLI prints a one paragraph block:

🔍 Agent Capability Surface
   Agents: 4 │ Tools: 18 │ Gaps: 9 (1 critical, 3 high)
   Governance Score: 91/100
   View full surface: https://app.inkog.io/dashboard/agents/9dbc735a-...

The dashboard renders the full inventory: every agent, every tool the agent invokes, every MCP server it connects to, every gap, every control, each row deep linked to its source location.

Governance Score is the headline. 100 means every required control is wired for every effectful capability. The formula is graduated: critical financial actions without human_approval cost more than a data mutation without audit_log. The math is open. We publish the weight table in the docs.

Compliance mapping. Every gap carries the regulatory clauses it violates. A missing human_approval on a destructive irreversible tool maps to EU AI Act Article 14(4)(d), NIST MAP 5.1, ISO 42001 A.6.2, CWE-862, OWASP LLM06, and AIUC-1 Principle 4. You can wire the gap into your existing GRC tracker without re-mapping a thing.

A real example: Microsoft AutoGen

We ran the surface against the public Microsoft AutoGen repository at HEAD. AutoGen is the multi agent framework Microsoft uses in research and customer demos and ships in production at thousands of companies. Here is the summary the CLI returned:

🔍 Agent Capability Surface
   Agents: 4 │ Tools: 18 │ Gaps: 9 (1 critical, 3 high)
   Governance Score: 91/100

The one critical gap is in CodeExecutorAgent: the LLM generated code is handed straight to the executor without a human approval gate. The three high severity gaps are missing cycle_guard on three of AutoGen's multi agent collaboration patterns. The same matrix that drove the score also produced the compliance mapping for each gap, ready to drop into your GRC tracker.

Validation, in numbers. Zero false positives across the 14 fixture benchmark we shipped with Phase B. We track that bar as a contract: see the 101 tests in pkg/capability.

How it works

The pipeline runs entirely in your existing scan:

The worker builds a typed InkogGraph from your source. 15 frameworks supported today, including Microsoft AutoGen, LangGraph, CrewAI, Pydantic AI, smolagents, n8n, and Flowise.
MapGraph() walks the graph and emits typed Capability, Edge, Declaration, Control rows.
ComputeGaps() applies a graduated control matrix. EffectFinancial + IsHighValueFinancial requires HumanApproval at critical severity. EffectCommunication requires AuditLog only as a recommendation, not a blocker. Internal network HTTP gets no requirements. Logging tools are explicitly excluded from the matrix so logging.info() never produces a gap.
Delegation cycles are detected by DFS over the delegation graph. Unbounded loops by inspecting the TerminationGuard of every LoopNode.
Everything is persisted to Postgres, scoped per organization, and surfaced via GET /v1/capabilities/{scan_id}.

The CLI shows the summary inline. The dashboard renders the inventory. The API serves the full graph for your own tooling.

Try it

If you already have Inkog installed:

bash

inkog -path ./your-agent-code

The capability block appears below your normal scan results. No flag, no config. Click the deep link to see the full inventory in the dashboard.

If you do not:

bash

# Homebrew
brew install inkog-io/tap/inkog

# Go
go install github.com/inkog-io/inkog/cmd/inkog@latest

# Or download a release
# https://github.com/inkog-io/inkog/releases

The CLI is Apache 2.0, the surface is free during beta, and the entire pkg/capability package is open source.

What is next

This is Phase C. We have a roadmap.

Trust Badges. Embeddable badge in your README showing the live governance score, similar to a CI status badge. Designed so your agent's downstream consumers can verify your posture without seeing your source.
Drift alerts. Slack and email notifications when a new scan introduces a capability your previous scan did not have, or removes a control you previously had wired.
AGENTS.md generator. Given a high scoring surface, generate the AGENTS.md declarations that match. The opposite direction of what we already do: instead of verifying that code matches the declaration, write the declaration so it matches the code.
Per operation oversight. Today the matrix is per capability. Next we will classify per individual call site, so an agent with three delete_user invocations can have approval wired on the dangerous path and audit only on the dry run path.

Capability surface is the foundation for all of these. We would love your feedback at github.com/inkog-io/inkog/issues. If you have an agent repo you want benchmarked, drop it in the issue.