Your change advisory board meets on Thursdays. Your AI agent deploys to production in under a second.
These two statements cannot coexist. And yet, thousands of engineering orgs are about to try.
AI agents are past the novelty coding assistant phase. Claude Code, OpenAI Codex, and GitHub Copilot already run infrastructure commands, deploying code, and querying production databases . The next generation — the kind that gets API keys, service account credentials, and on-call rotation pages — will not ask to act. They will act. And the IT operating models we built over the last three decades are not ready.
This is not a prediction. This is an autopsy of a framework that hasn't died yet but is already on life support.
The Millisecond Problem
ITIL 4, the dominant framework for IT service management, defines 34 management practices covering change control, incident management, problem management, and service desk operations. It codifies a world where humans initiate, humans approve, humans execute, and humans verify. The ITIL service value chain is built around six activities — Plan, Improve, Engage, Design and Transition, Obtain/Build, Deliver and Support — that assume a human-paced clock.
AI agents operate on a different clock. An agent that detects a production anomaly, queries the logs, runs a diagnostic, and rolls back a bad deploy can complete the entire cycle before a human finishes reading the PagerDuty notification. The OWASP LLM06:2025 Excessive Agency classification explicitly warns about this: agents granted the ability to call functions autonomously can chain actions in ways that bypass every intended control.
When your incident response runbook says "escalate to the on-call engineer" but your agent has already mitigated the issue, the runbook is not a safety net. It is a historical document. The question is not whether agents will automate incident response — they already can. The question is whether your governance model can keep up.
Read-Only First Is Not a Strategy, It Is a Truce
The most common answer I hear from engineering leaders is: "We'll just make agents read-only."
This sounds prudent. It is also naive.
A read-only-first architecture for infrastructure agents is a useful starting pattern — the Microsoft Agent Governance Toolkit ships with an allow-default policy example, but supports deny-by-default through custom OPA/Cedar policy configurations. But "agents can look but not touch" breaks the moment you actually want agents to do something. And you will. You will want an agent to restart a service. To scale a cluster. To push a config change. To roll back a deploy.
The real pattern is not read-only. It is governed-by-default: agents can do anything within policy, but every action is intercepted, evaluated, and recorded before it touches a real system.
This is the architectural shift that most orgs miss. They reach for observability tools — LangSmith, Langfuse — to log what agents did after the fact. But by the time the log is written, the DROP TABLE has already executed.
Observability is post-mortem. Governance must be pre-mortem.
The OWASP LLM01:2025 Prompt Injection guidance is unambiguous: "it is unclear if there are fool-proof methods of prevention for prompt injection." Microsoft's AI Red Teaming Agent formalizes Attack Success Rate (ASR) as the canonical metric for this class of failure. The same class of adaptive attacks achieved 100% success rates against GPT-4o, GPT-3.5, Claude models, and Llama-3-Instruct-8B in published research. If a prompt can trick an agent into acting against its instructions, "read-only" is just a prompt away from becoming "write-everything."
The deterministic answer is to interpose code — not prompts between the agent's intent and the action.
The CAB Is Dead. Long Live the Runtime.
The Change Advisory Board (CAB) is one of ITIL's sacred cows. A group of humans reviews proposed changes, assesses risk, and approves or rejects. It works great when changes happen weekly.
It is a joke when changes happen every second.
When an agent-driven deploy happens at machine speed, you cannot convene a CAB. You need a governance runtime — a middleware layer that sits between agents and infrastructure, evaluates policy in real time, routes approvals to the right human (or automated) decision-maker, and records every decision in a tamper-evident audit trail.
This category is emerging fast. Three projects define the shape of it, and none of them agree on much — which is exactly what you'd expect from a category that didn't exist six months ago:
- DashClaw ( 275 stars on GitHub ) has the tightest loop: intent → guard → approve → record. It describes itself as "the governance runtime for AI agents" and intercepts risky actions before they execute by enforcing declarative policies. High-risk ops get routed to a human. Everything gets a replayable audit record. It plugs into Claude Code, Codex, Hermes Agent, LangChain, CrewAI, OpenAI Agents SDK — any agent runtime — via hooks, plugins, SDKs, or MCP. Not after the fact. Before.
- OpenMAO ( 44 stars on GitHub ) has the most ambitious vision: "the accountable layer above any framework." It's an organization-of-record for AI agents, built around a flywheel — governance → institutional memory → self-correction → self-learning → audited track record → wider autonomy. Autonomy is earned, not assumed. Every widening is proven on evidence. Every widening is reversible. Big swing. Still early.
- Microsoft Agent Governance Toolkit ( 4.3k stars on GitHub ) is the heavyweight. Policy enforcement, zero-trust identity (SPIFFE/DID/mTLS), execution sandboxing with four privilege rings, and SRE tooling including kill switches and chaos testing. It covers 10/10 OWASP Agentic Top 10 risks and ships in Python, TypeScript, .NET, Rust, and Go. The core primitive is
govern()— a function wrapper that intercepts every tool call and evaluates it against YAML/OPA/Cedar policy before the tool executes. If Microsoft ships this as a first-party Azure service, the category consolidates fast.
Notice what none of these projects are: logging tools. They are not "what happened" systems. They are "what is allowed" systems. That distinction is the entire point.
The governance runtime replaces more than the CAB. It changes everything downstream, starting with on-call.
On-Call at Machine Speed
Consider how on-call changes when agents become operational participants.
Today, an on-call engineer receives an alert, investigates, determines the cause, and takes remedial action. The mean time to resolution (MTTR) is measured in minutes to hours.
With agents, the loop compresses. An agent can:
- Receive the alert via webhook
- Query the monitoring system for context
- Check recent deploys for the offending change
- Roll back the change
- Verify the rollback succeeded
All of this happens in seconds, not minutes. But the governance challenge is: should the agent have the authority to roll back? What if the roll-back is wrong? What if it makes things worse?
The governance runtime answers these questions with policy, not prayer. DashClaw's model, for example, would require the agent to declare intent before rolling back, evaluate the action against policy (risk score threshold, deploy gate, time-of-day restrictions), and either allow, block, or route for human approval. If the action is blocked because it exceeds risk thresholds, the agent can propose a lower-risk alternative. Every decision — allow, block, approve — is recorded as a replayable record.
This is not slower than the human-only loop. It is faster, because most decisions are automated, and the ones that require human judgment are surfaced with full context to the right person on whatever channel they prefer — dashboard, CLI, mobile PWA, Telegram, Discord.
Access Control for Non-Human Identities
ITIL's access control model is built around human roles: sysadmin, DBA, network engineer, service desk. These roles map to static permission sets.
Agents do not fit this model. An agent is a non-human identity (NHI) — a digital entity that needs authentication, authorization, and audit, but cannot use MFA, cannot attend security training, and cannot be fired. NHIs already outnumber human identities in most cloud environments. Adding AI agents to that population compounds the challenge.
The emerging pattern is to map permissions to agent capabilities rather than human roles. An agent gets a specific set of tool permissions — "can read logs from us-east-1 production," "can restart the auth service," "cannot drop tables" — enforced at the governance runtime level, not at the infrastructure level.
DashClaw's approach uses JWKS-verified OIDC bearer tokens (EdDSA/RSA/ECDSA) to cryptographically bind agent identity to every action. Replay protection rejects reused tokens. The Microsoft toolkit uses SPIFFE identities and a zero-trust mesh. Both models treat every action as authenticated and authorized, regardless of which "human role" the agent is acting on behalf of.
This is the direction. The opposite — giving agents a shared service account with broad IAM permissions — is a breach waiting to happen.
What Changes When Agents Are Team Members
| Traditional IT (ITIL) | Agent-Operating Model (Governance Runtime) |
|---|---|
| Change approval via CAB | Change approval via policy engine (auto or human-in-the-loop) |
| Incident response in minutes | Incident response in milliseconds |
| Human roles (sysadmin, DBA) | Agent capabilities (can-read-logs, can-restart) |
| Access control via IAM groups | Access control via cryptographically verified agent identity |
| Audit logs written after the fact | Replayable decision records written before execution |
| Runbooks written for humans | Policies written for deterministic enforcement |
| MTTR as key metric | Blocked-actions as key metric |
| Service desk as triage point | Governance runtime as control plane |
The Hard Truth: Your ITSM Toolchain Will Not Save You
ServiceNow, Jira Service Management, PagerDuty — these tools are built for human workflows. They expect tickets, approvals, and manual handoffs. They cannot evaluate a policy in under a millisecond. They cannot intercept an agent's tool call before it executes. They cannot cryptographically bind identity to action.
This is not a criticism of those platforms. It is a statement about physics. The latency of human-paced IT governance is built into their architecture.
The governance runtime category exists because the existing toolchain operates at the wrong time granularity. When your agent can execute 50+ actions per second, and each action needs policy evaluation, you cannot route through a ticket system. You need middleware that decides at machine speed and escalates to humans only when risk exceeds a threshold.
The Bottom Line
If your engineering org is deploying AI agents with infrastructure access and your IT operating model still revolves around CAB meetings, ITSM tickets, and human-runbooks, you have a gap. Not a small gap. A gap that will be exposed the first time an agent makes a bad decision at machine speed and there is no governance layer between it and production.
The fix is not to slow agents down. The fix is to build a governance runtime that operates at their speed.
DashClaw, OpenMAO, and the Microsoft Agent Governance Toolkit are the early signals of this category. None is complete. All are evolving fast. But the direction is clear: ITIL's era of human-paced governance is ending. The runtime era is beginning.
The orgs that understand this now — and start putting governance middleware between their agents and their infrastructure — will be the ones whose agents stay deployed. The rest will learn the hard way what happens when you give a stochastic system root access and ask it to be careful.
Further Reading
- DashClaw GitHub Repository — The governance runtime for AI agents. Intercepts actions, enforces policy, requires approvals, produces replayable audit trails. MIT license, self-hosted. The canonical reference for the governance runtime category.
- OpenMAO GitHub Repository — Open-source organization-of-record for AI agents. Builds a flywheel of governance, institutional memory, self-correction, and earned autonomy. Apache 2.0 license.
- Microsoft Agent Governance Toolkit — Policy enforcement, zero-trust identity, execution sandboxing, and SRE for autonomous AI agents. Covers 10/10 OWASP Agentic Top 10 risks. Ships in Python, TypeScript, .NET, Rust, Go.
- OWASP LLM01:2025 — Prompt Injection — The official OWASP classification of prompt injection vulnerabilities, including the explicit admission that "it is unclear if there are fool-proof methods of prevention." Foundational reading for anyone building agent governance.
- OWASP LLM06:2025 — Excessive Agency — The OWASP risk classification specifically about agents granted too much autonomy. Directly addresses the problem that governance runtimes solve.
- What is a Non-Human Identity? (CyberArk) — Definitive overview of NHIs: what they are, why they outnumber human identities, and the security challenges of managing machine identities at scale.
- Microsoft AI Red Teaming Agent (Azure Foundry) — Microsoft's documentation for the AI Red Teaming Agent, which formalizes Attack Success Rate (ASR) as the canonical metric for evaluating adversarial attack success against AI systems.
- Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks (arXiv:2404.02151) — The ICLR 2025 paper by Andriushchenko et al. demonstrating 100% attack success rates against GPT-4o, GPT-3.5, Claude models, and Llama-3-Instruct-8B using adaptive jailbreaking methods.



No comments yet