Three patterns that make AI agents…

JM

Jeb Million

Mar 2026 · 4 min read · AI Ops

Production AI is different from notebook AI

A model that scores 92% on a benchmark notebook can fail spectacularly on a Friday afternoon when a customer phrases a query in a way the test set never considered. The discipline that bridges the gap is operational, not architectural.

Pattern 1 — Content firewalls

Every input passes through an input filter and every output through an output filter, both with explicit allow-lists for the tenant context. Prompt injection is not a vibes problem; it is a parsing problem with parsing solutions.

Pattern 2 — Evaluation harnesses

Run a representative golden-set against every model deployment, with both quantitative and qualitative scoring. Fail-closed on any regression beyond a two-sigma drift. Make this CI for prompts — the way it already is for code.

Pattern 3 — The kill-switch

Each agent has a kill-switch that takes it out of production traffic in under five seconds. The switch is exercised in tabletop drills monthly. We have used it twice in production; both times we caught the failure before customers did.

The compounding effect

On their own these are obvious. The compounding effect of running all three from day one is what gets you to 99.99% AI uptime in regulated environments — and what regulators will accept as evidence of a responsible AI operating model.

Want to discuss this?

A senior partner will respond within one business day.

Request Briefing

Three patterns that make AI agents production-safe

Production AI is different from notebook AI

Pattern 1 — Content firewalls

Pattern 2 — Evaluation harnesses

Pattern 3 — The kill-switch

The compounding effect

Related articles.

Why we don’t do “AI strategy”

Why we don’t do “AI strategy”

Designing for the 3am page

Designing for the 3am page

Why mainframe modernisation finally pays back

Why mainframe modernisation finally pays back

Want to discuss this?