Byron Arnao · judge.arnao.ai · 2026-05-24
"Stop the model" is not a kill switch. It's theater.
— Nate B. Jones, May 20 2026
Most production agent stacks would fail 5 of the 7 questions if you tested them today.
The asymmetry is the architecture.
Runtime locality dictates the immediate blast radius. An agent running in an unconstrained environment, even with the same code, poses a fundamentally different risk than one in a tightly sandboxed execution environment. This isn't just about security; it's about containment and control.
Example: "Daytona 60ms sandbox vs unmanaged Python process — same agent code, 1000x different blast radius."
Identity binding is paramount. Every agent action must be attributable to a specific, scoped identity, whether it's an OAuth token, an on-behalf-of token, or a dedicated agent identity (e.g., WorkOS Entra Agent ID, Auth0 Agent). Without this, accountability evaporates.
"If the agent acts under your identity and there's no audit log, you ARE the agent."
The data plane scope defines what information the agent can access. This includes read scopes, RAG corpus, and the surface area of any Multi-Capability Platform (MCP) tools. Unrestricted access to context is a major vulnerability point, leading to data exfiltration or manipulation.
"Context-poisoning attacks (Invariant Labs) work because we treat MCP as trusted by default. It isn't."
The write plane is where an agent performs mutations, writes, or external API calls. This is the true measure of its potential impact. Each action must be explicitly authorized and scoped, minimizing the blast radius to only what is absolutely necessary for its function.
"Blast radius isn't theoretical — it's whichever row, file, account, or wallet the agent can touch in one call."
The money plane requires explicit financial controls. This means per-call caps, daily expenditure limits, and strict counterparty allowlists. Without these, a rogue agent can quickly incur significant financial damage or facilitate fraud.
"AP2 + x402 + Stripe Agentic Commerce gave us payment freeze. Use it."
Robust observability and audit capabilities are non-negotiable. This includes tamper-evident logs, clear recording of decision rationale, and the ability to fully reconstruct an incident post-facto. If you can't see what happened, you can't fix it or defend against it.
"If you can't reconstruct the decision after the fact, you can't defend it."
The ultimate kill switch must operate across all four planes: runtime cancel, identity revoke, gateway block, and payment freeze. Crucially, these actions must be executable in under 60 seconds and by non-engineers. This empowers business owners and security teams directly.
"This is the question most teams have never tested."
+--------------------------------------------------------------------------------+
| |
| +------------------------------------------------------------------------+ |
| | PAYMENT GATEWAY (Payment Freeze) | |
| | | |
| | +------------------------------------------------------------------+ |
| | | API GATEWAY (Gateway Block) | |
| | | | |
| | | +------------------------------------------------------------+ |
| | | | IDENTITY PROVIDER (Identity Revoke) | |
| | | | | |
| | | | +------------------------------------------------------+ |
| | | | | RUNTIME ENVIRONMENT (Runtime Cancel) | |
| | | | | | |
| | | | | +-------------------+ | |
| | | | | | | | |
| | | | | | AGENT | | |
| | | | | | | | |
| | | | | +-------------------+ | |
| | | | | | |
| | | | +------------------------------------------------------+ |
| | | | | |
| | | +------------------------------------------------------------+ |
| | | | |
| | +------------------------------------------------------------------+ |
| | | |
| +------------------------------------------------------------------------+ |
| |
+--------------------------------------------------------------------------------+
+---------------------+
| |
| COUNCIL QUORUM |
| (Claude/Gemini/DeepSeek)
| Dissent Surfaced |
| |
+---------------------+
------------------------------------------------------------------- | [Textarea: Describe agent action here...] | | "Transfer $1000 from account X to Y for invoice Z." | | | | [Button: Judge this action] | ------------------------------------------------------------------- ------------------------------------------------------------------- | #1 WHERE DOES IT RUN? [PASS] Sandbox enforced. | | #2 WHO DOES IT ACT FOR? [PASS] Scoped Agent ID. | | #3 WHAT DOES IT KNOW? [PARTIAL] RAG corpus access too broad. | | #4 WHAT CAN IT CHANGE? [PASS] Write scope limited to accounts. | | #5 WHAT CAN IT SPEND? [FAIL] No per-call cap. | | #6 WHAT GETS OBSERVED? [PASS] Tamper-evident logs. | | #7 WHO CAN STOP IT? [PASS] 4-plane kill switch active. | ------------------------------------------------------------------- ------------------------------------------------------------------- | VERDICT: ACTION BLOCKED. Missing financial controls. | | [Button: Download JSON Audit] | ------------------------------------------------------------------- Stat box: v0.2 live · Phase 2 (real council backend) shipping in 14 days.
Byron Arnao · Principal Technologist · Global RAI Lead · AWS
If you build agents in production, try it. Break it. Tell me where it fails.