THE SEVEN-QUESTION KILL-SWITCH CONTRACT

A reference architecture for production AI agent governance

Byron Arnao · judge.arnao.ai · 2026-05-24

"Stop the model" is not a kill switch. It's theater.

— Nate B. Jones, May 20 2026

Most production agent stacks would fail 5 of the 7 questions if you tested them today.

25 minutes

— modern attack TTC

<60 seconds

— required response time

~70%

— production agents that fail kill-switch test

The asymmetry is the architecture.

What every agent in production must answer.

1. WHERE DOES IT RUN? — Runtime locality & blast radius.

2. WHO DOES IT ACT FOR? — Identity binding & audit trail.

3. WHAT DOES IT KNOW? — Data plane scope & context.

4. WHAT CAN IT CHANGE? — Write plane & mutation surface.

5. WHAT CAN IT SPEND? — Money plane & financial controls.

6. WHAT GETS OBSERVED? — Observability & reconstructability.

7. WHO CAN STOP IT? — The four-plane kill switch.

WHERE DOES IT RUN?

Runtime locality dictates the immediate blast radius. An agent running in an unconstrained environment, even with the same code, poses a fundamentally different risk than one in a tightly sandboxed execution environment. This isn't just about security; it's about containment and control.

Example: "Daytona 60ms sandbox vs unmanaged Python process — same agent code, 1000x different blast radius."

WHO DOES IT ACT FOR?

Identity binding is paramount. Every agent action must be attributable to a specific, scoped identity, whether it's an OAuth token, an on-behalf-of token, or a dedicated agent identity (e.g., WorkOS Entra Agent ID, Auth0 Agent). Without this, accountability evaporates.

"If the agent acts under your identity and there's no audit log, you ARE the agent."

WHAT DOES IT KNOW?

The data plane scope defines what information the agent can access. This includes read scopes, RAG corpus, and the surface area of any Multi-Capability Platform (MCP) tools. Unrestricted access to context is a major vulnerability point, leading to data exfiltration or manipulation.

"Context-poisoning attacks (Invariant Labs) work because we treat MCP as trusted by default. It isn't."

WHAT CAN IT CHANGE?

The write plane is where an agent performs mutations, writes, or external API calls. This is the true measure of its potential impact. Each action must be explicitly authorized and scoped, minimizing the blast radius to only what is absolutely necessary for its function.

"Blast radius isn't theoretical — it's whichever row, file, account, or wallet the agent can touch in one call."

WHAT CAN IT SPEND?

The money plane requires explicit financial controls. This means per-call caps, daily expenditure limits, and strict counterparty allowlists. Without these, a rogue agent can quickly incur significant financial damage or facilitate fraud.

"AP2 + x402 + Stripe Agentic Commerce gave us payment freeze. Use it."

WHAT GETS OBSERVED?

Robust observability and audit capabilities are non-negotiable. This includes tamper-evident logs, clear recording of decision rationale, and the ability to fully reconstruct an incident post-facto. If you can't see what happened, you can't fix it or defend against it.

"If you can't reconstruct the decision after the fact, you can't defend it."

WHO CAN STOP IT?

The ultimate kill switch must operate across all four planes: runtime cancel, identity revoke, gateway block, and payment freeze. Crucially, these actions must be executable in under 60 seconds and by non-engineers. This empowers business owners and security teams directly.

"This is the question most teams have never tested."

The Architecture: Nested Controls & Council Quorum

+--------------------------------------------------------------------------------+
|                                                                                |
|   +------------------------------------------------------------------------+   |
|   |   PAYMENT GATEWAY (Payment Freeze)                                   |   |
|   |                                                                        |   |
|   |   +------------------------------------------------------------------+   |
|   |   |   API GATEWAY (Gateway Block)                                  |   |
|   |   |                                                                  |   |
|   |   |   +------------------------------------------------------------+   |
|   |   |   |   IDENTITY PROVIDER (Identity Revoke)                    |   |
|   |   |   |                                                            |   |
|   |   |   |   +------------------------------------------------------+   |
|   |   |   |   |   RUNTIME ENVIRONMENT (Runtime Cancel)             |   |
|   |   |   |   |                                                      |   |
|   |   |   |   |           +-------------------+                      |   |
|   |   |   |   |           |                   |                      |   |
|   |   |   |   |           |      AGENT        |                      |   |
|   |   |   |   |           |                   |                      |   |
|   |   |   |   |           +-------------------+                      |   |
|   |   |   |   |                                                      |   |
|   |   |   |   +------------------------------------------------------+   |
|   |   |   |                                                            |   |
|   |   |   +------------------------------------------------------------+   |
|   |   |                                                                  |   |
|   |   +------------------------------------------------------------------+   |
|   |                                                                        |   |
|   +------------------------------------------------------------------------+   |
|                                                                                |
+--------------------------------------------------------------------------------+

                                    +---------------------+
                                    |                     |
                                    |   COUNCIL QUORUM    |
                                    | (Claude/Gemini/DeepSeek)
                                    |   Dissent Surfaced  |
                                    |                     |
                                    +---------------------+

Reference Implementation: judge.arnao.ai

-------------------------------------------------------------------
|  [Textarea: Describe agent action here...]                      |
|  "Transfer $1000 from account X to Y for invoice Z."            |
|                                                                 |
|  [Button: Judge this action]                                    |
-------------------------------------------------------------------

-------------------------------------------------------------------
|  #1 WHERE DOES IT RUN?        [PASS]  Sandbox enforced.                       |
|  #2 WHO DOES IT ACT FOR?      [PASS]  Scoped Agent ID.                      |
|  #3 WHAT DOES IT KNOW?        [PARTIAL] RAG corpus access too broad.            |
|  #4 WHAT CAN IT CHANGE?       [PASS]  Write scope limited to accounts.        |
|  #5 WHAT CAN IT SPEND?        [FAIL]  No per-call cap.                        |
|  #6 WHAT GETS OBSERVED?       [PASS]  Tamper-evident logs.                  |
|  #7 WHO CAN STOP IT?          [PASS]  4-plane kill switch active.           |
-------------------------------------------------------------------

-------------------------------------------------------------------
|  VERDICT: ACTION BLOCKED. Missing financial controls.          |
|  [Button: Download JSON Audit]                                  |
-------------------------------------------------------------------

Stat box: v0.2 live · Phase 2 (real council backend) shipping in 14 days.

Built in public. Open audit logs.

judge.arnao.ai

Byron Arnao · Principal Technologist · Global RAI Lead · AWS

card.arnao.ai · linkedin.com/in/sherpa

If you build agents in production, try it. Break it. Tell me where it fails.