← All posts
Operations

Every automation should fail closed.

By Dave Wilson31 March 20266 min read

A friend who runs a small accounting practice rang me last quarter, mid-panic.

The automation she had bought from another vendor had spent the previous three weeks happily emailing the wrong VAT invoice to the wrong client. It did not error. It did not alert. It did not break. It just quietly applied the wrong template to the wrong customer, in correct British English, formatted properly, sent reliably, every time.

By the time it was caught, eleven clients had received eleven incorrect invoices. The practice had to send eleven apologies, eleven corrections, and explain to two of them in person why a four-figure VAT amount had been off by a factor of ten.

This is the worst kind of automation failure. Not the loud one. The quiet one. The one that confidently does the wrong thing for three weeks before anyone notices.

The principle that prevents this is called failing closed. Every automation we build at CueBot follows it. Here is what it means and why every system you commission, from us or anyone else, should be held to it.

What failing closed means

When a system encounters a condition it cannot handle confidently, it has two choices.

Fail open means: do something anyway. Take the best guess. Make the call. Send the message. Close the ticket. Hope it is right.

Fail closed means: stop. Escalate to a human. Hand off cleanly. Refuse to act unless you are sure.

Most automations, by default, fail open. They are configured to "be helpful". They prefer false-positive action to no action. This is fine for low-stakes work like sorting your inbox. It is catastrophic for anything that touches money, customers, or compliance.

CueBot's default is the opposite. Every workflow we ship has explicit failure modes that route to a human, with the relevant context, the moment confidence drops below a threshold. The threshold is set per workflow:

  • Customer-facing replies: only auto-send if intent classification is above 95% confidence. Otherwise the draft is created and routed to the inbox for a human to send.
  • Invoices and payment links: never auto-send. Always require human confirmation, even if every field is correct. The audit trail matters more than the time saved.
  • Order updates and shipping comms: auto-send only on platform-confirmed events (Shopify "fulfilled", Stripe "paid"). Never on inferred events.
  • Refunds and credits: never automated. Ever. Drafted, yes. Sent, no.

This sounds restrictive. It is. The restrictiveness is the point.

What this costs you

A system that fails closed handles a smaller percentage of cases end-to-end than a system that fails open. Concretely: an open system might resolve 85% of customer queries without human input. A closed system might resolve 70%.

That gap is the cost of safety. It is real. We do not pretend otherwise.

What you get in exchange:

  • No silent regressions. When the system stops working correctly, it stops doing anything, loudly. Slack notification, email alert, queue building up in the inbox. You know within an hour, not three weeks.
  • No expensive false positives. The thing that auto-sent eleven wrong invoices cannot happen. The system would have stopped at invoice one and asked for human review.
  • No erosion of customer trust. Customers can forgive an apology email about delayed support. They cannot forgive being addressed as someone else's account, or being charged the wrong VAT rate, or receiving a refund acknowledgement for a refund they did not request.
  • No compliance liabilities. UK GDPR, MTD VAT, FCA rules, sector-specific obligations - all of these have hard edges. A system that fails closed never crosses them. A system that fails open might cross several.

An open system might resolve 85% of cases. A closed system resolves 70%. The 15% gap is the cost of safety. It is the right price to pay.

How we build it in

There are four mechanics that make failing closed work in practice. Every CueBot build implements at least three of them.

Confidence thresholds, not boolean rules. When the AI classifies a customer query, it does not return "category X" - it returns "category X with 0.91 confidence". Below the threshold, the system stops and asks. The threshold is exposed and tunable per workflow. We document it.

Human-in-the-loop checkpoints on monetary actions. Anything that issues, refunds, credits, or alters financial state has a mandatory pause. The system can draft. It cannot send. The pause adds maybe 90 seconds to the workflow. It removes an entire class of expensive accident.

Loud failure paths, not silent ones. When the workflow refuses to act, it does not just sit there. It posts to a Slack channel, files a ticket in your support tool, or emails a designated escalation address. The whole point is to make refusal visible, so it gets resolved within minutes rather than missed for weeks.

Idempotency keys on everything. If the same event fires twice, the system handles it once. This sounds technical and it is, but the consequence is that retries, replays, and webhook duplicates never cause double-actions. No customer ever gets two cart-recovery emails for the same abandonment, no invoice ever sends twice.

The question to ask any vendor

Before you sign with anyone selling automation, ask this:

"When the system is unsure, what does it do?"

If the answer is "it does its best" or "it uses a fallback model" or anything that sounds like the automation will keep going regardless, run.

If the answer is "it stops, alerts a human, and resumes when a person has reviewed the case", you are talking to operators who understand that the cost of one quiet error in production is bigger than the time saved by a thousand routine successes.

We are not the only people who think this way. We are not even the strictest. But the principle is real, and worth more than any specific technology choice. Pick a vendor who lives by it.

The eleven-invoice friend, by the way, switched vendors. Her new system fails closed. She has not had an apology email to send in six months.

Want this kind of system in your business?

Fifteen-minute discovery call. No pitch deck. Honest answer about whether this fits.