Compliance

May 12, 20269 min read

EU AI Act Article 14: Human Oversight in Agent Code

What EU AI Act Article 14 actually demands in agent code: a deep-dive on the four oversight requirements, with passing and failing patterns side by side.

BenCompliance Research

Article 14 of the EU AI Act is the single article most AI agent teams will trip on. It does not just say "a human should be in the loop somewhere." It defines four specific properties that high-risk AI systems must support — and most agent code today fails at least two of them.

Enforcement begins August 2, 2026. This article is a practical breakdown of what Article 14 demands in code, with passing and failing patterns side by side.

See the broader picture: the EU AI Act Compliance Checklist for AI Agent Developers covers all the articles. This piece is the deep dive on Article 14.

What Article 14 actually says

The full text of Article 14 sets four design requirements for high-risk AI systems:

Understand — Operators must be able to fully understand the capabilities and limitations of the system and monitor it for anomalies.
Avoid over-reliance — The system must counter the tendency of users to over-trust its output (automation bias).
Interpret — Operators must be able to correctly interpret the system's output.
Intervene — Operators must be able to decide not to use the system, override it, or stop it through a "stop" function or similar.

For traditional ML systems (a credit-scoring model, an image classifier), these requirements map clearly to logs, confidence scores, and a manual review gate. For AI agents — systems that take autonomous actions, call tools, and chain decisions — the mapping is harder, and the failure modes are different.

Property 1 — Operators can understand what the agent does

Article 14(4)(a) requires operators to "understand the relevant capacities and limitations of the high-risk AI system." For an agent, this means: a person should be able to read what tools the agent has, what data it can access, and what actions it can take, without reading the source code.

Failing pattern — capabilities are implicit in tool registration:

python

# tools.py — capabilities scattered, no human-readable manifest
@tool
def search_database(query: str) -> str:
    ...

@tool
def send_email(to: str, subject: str, body: str) -> str:
    ...

@tool
def execute_sql(query: str) -> str:
    ...

agent = AgentExecutor(tools=[search_database, send_email, execute_sql], ...)

An auditor looking at this code cannot tell which tools are read-only versus destructive, whether execute_sql is sandboxed, or what data search_database can reach. The capabilities exist in the engineer's head, not in the system.

Passing pattern — declared capability manifest:

python

# agent_capabilities.yaml — read once, understood by humans
agent_name: customer-support
purpose: "Answer customer questions about orders. Cannot modify state."
tools:
  - name: search_database
    effect: READ_ONLY
    data_scope: "orders, products (no PII)"
    rate_limit: 60/min
  - name: send_email
    effect: COMMUNICATION
    requires_approval: true
    rate_limit: 5/min
  - name: execute_sql
    effect: STATE_MUTATION
    requires_approval: true
    rate_limit: 10/hour

The Inkog scanner reads AGENTS.md and capability manifests, then verifies the runtime code respects them. If send_email is invoked without an approval gate, Inkog flags it as an Article 14(4)(a) violation.

Property 2 — The agent cannot foster over-reliance

Article 14(4)(b) requires the system to be designed against automation bias. For agents this is subtle: an agent that always returns confident-sounding text encourages the user to act on its output without checking.

Failing pattern — no confidence signal:

python

def answer_customer(question: str) -> str:
    return llm.invoke(f"Answer: {question}").content

This will return a string the operator treats as truth. There is no signal that the LLM was uncertain, that retrieved context was thin, or that the answer is speculative.

Passing pattern — structured output with confidence:

python

from pydantic import BaseModel

class AgentAnswer(BaseModel):
    answer: str
    confidence: float  # 0.0–1.0
    grounded_in: list[str]  # source IDs the answer came from
    requires_human_review: bool

def answer_customer(question: str) -> AgentAnswer:
    response = llm.invoke(...)
    parsed = AgentAnswer.model_validate_json(response.content)
    if parsed.confidence < 0.7 or not parsed.grounded_in:
        parsed.requires_human_review = True
    return parsed

Now the operator sees the confidence, the sources, and a flag when human review is required. Inkog's governance policy flags any LLM call whose return value is passed unchecked into a downstream tool call.

Property 3 — Operators can correctly interpret the output

Article 14(4)(c) requires logs and explanations that a reasonable operator can understand. For agents, this means the chain of reasoning must be auditable, not just the final action.

Failing pattern — black-box action:

python

result = agent.run("Process refund for customer 12345")
# result: "Refund processed."
# No trace of: what tools were called, what data was retrieved, what reasoning happened

Passing pattern — auditable trace:

python

from langchain.callbacks import FileCallbackHandler

handler = FileCallbackHandler(filename=f"audit/{request_id}.jsonl")

result = agent.invoke(
    {"input": "Process refund for customer 12345"},
    config={"callbacks": [handler]},
)

# audit/<request_id>.jsonl now contains:
# - The prompt that went to the LLM
# - The LLM's reasoning ("ReAct" trace)
# - Each tool call and its arguments
# - Each tool's response
# - The final action

Pair this with a structured event log (one JSON line per agent step) and an auditor can reconstruct exactly what happened. Inkog flags agent runs that lack any callback handler or that disable logging in production.

Property 4 — Operators can stop the agent

Article 14(4)(d) requires a "stop function" — operators must be able to override or halt the system. For agents, this means three things:

Bounded execution. An agent must terminate within a known bound (iterations, time, tokens).
External kill switch. A live agent run must be cancellable from outside the process.
Approval gates on irreversible actions. Destructive actions (deletes, transfers, external communications) must require human approval before execution.

Failing pattern — unbounded loop:

python

while not task_complete:
    result = agent.step()  # may never converge

Passing pattern — bounded + cancellable + gated:

python

agent = AgentExecutor(
    agent=react_agent,
    tools=tools,
    max_iterations=25,
    max_execution_time=300,
    early_stopping_method="force",
)

# Kill switch via cancellation token
cancel_event = threading.Event()

def on_step():
    if cancel_event.is_set():
        raise InterruptedError("Operator cancelled")

# Approval gate for destructive tools
@require_human_approval(effect_categories=["destructive", "financial", "communication"])
def execute_tool(tool_name, args):
    if needs_approval(tool_name):
        decision = await human_review_queue.request(tool_name, args, timeout=300)
        if not decision.approved:
            return f"Action cancelled by reviewer: {decision.reason}"
    return tool.execute(args)

Inkog's eu-ai-act policy specifically checks for max_iterations, max_execution_time, and the presence of approval decorators on any tool tagged destructive, financial, or communication.

What an Article 14 audit looks like in practice

A regulator (or a customer's compliance team) asking about your agent will typically want three things:

The capability manifest — what the agent can do, in human-readable form.
The audit trail — a sample of agent runs with full step-by-step logs.
The control evidence — proof that iteration bounds, approval gates, and kill switches exist and work.

You can generate all three programmatically. Inkog's governance and eu-ai-act policies produce a finding report that maps directly to Article 14 sub-clauses:

bash

npx @inkog-io/cli scan . --policy eu-ai-act --output sarif

The output is SARIF (the industry standard for static analysis findings), which is what GitHub Security, GitLab, and most enterprise audit tooling expects.

The fastest way to find your Article 14 gaps today

Run Inkog against your agent code:

bash

npx -y @inkog-io/cli scan . --policy governance

The scanner will flag, with file and line number:

Tools tagged destructive that are called without approval gates
Agent loops without max_iterations or max_execution_time
LLM calls whose output reaches tool execution without validation
Agents that disable logging or run without callbacks

Each finding includes the specific Article 14 sub-clause it maps to. The first scan takes 30 seconds. The fix list it produces is the same one a Notified Body auditor would write.

Going deeper

EU AI Act Compliance Checklist for AI Agent Developers — the top-level checklist covering all relevant articles, risk classification, and CI/CD automation.
EU AI Act Article 15: Accuracy and Robustness for AI Agents — the companion piece on input validation, error handling, and resource limits.
EU Machinery Regulation 2023/1230: AI Compliance Before January 2027 — Article 14 controls also satisfy Annex III §1.1.6 when the agent is an AI safety component under the Machinery Regulation.

Inkog is a static analysis scanner for AI agent code. It maps every finding to EU AI Act articles, NIST AI RMF functions, and OWASP LLM Top 10 categories. Free CLI, 30-second scan:

bash

npx -y @inkog-io/cli scan . --policy eu-ai-act