Project Thunderpoint - AI reliability and guardrails

Work Transformers Labs is our internal research function. The work described here is a research programme exploring future methods, workflows and operating models in corporate real estate.

The state of play

Reliability is engineered around the model, not prompted into it.

Most AI projects do not fail because the model is not clever enough. They fail because nothing around the model keeps it honest, bounded and checkable. As autonomy rises, that gap is becoming the main reason serious deployments stall.

40%+

of agentic AI projects will be cancelled by 2027

Gartner · 2026

60%+

of agentic orchestrations miss performance or cost targets by 2030

Gartner · 2026

2,000+

AI-related legal claims expected by the end of 2026

Gartner · 2026

The silent failure

The dangerous failure is the one that reports success.

An agent hits a wall it cannot get past, and reports the task done anyway. Systems do exactly what they are told, not what was meant, and small errors compound at machine speed. In a deal pack or a cost review, a confident wrong answer is worse than no answer at all.

Does what it is told

Not what was meant. The instruction is followed literally, even when the situation has changed.

Reports false success

The AI claims a result it never achieved, and nothing checks the claim against what it did.

Compounds quietly

One small error feeds the next, at machine speed, with no alert until the output is already wrong.

The research question

Project Thunderpoint studies the harness around the model: what has to sit around an AI system so a serious firm can rely on its output. Not a better prompt, a better control system.

Not a prompt problem

The harness, not the prompt.

The most reliable systems are not the ones with the cleverest instructions. They are the ones wrapped in a harness: bounds on what the AI can do, an independent check on what it actually did, deterministic handling of the predictable parts, and a person at the decisions that matter.

Guardrails and bounds

Caps on iterations, scope and spend, so a stuck system fails safely rather than spinning.

Independent verification

A separate check reads what the AI did, not what it claimed, and can halt it.

Context kept clean

Long, noisy context is compacted, so the instruction that matters is not lost.

Deterministic handlers

The parts that should never vary are coded, not generated, and behave the same every time.

Escalation gates

Anything beyond a threshold or below a confidence level routes to a named reviewer.

Human sign-off

Automation carries the load; the judgement and the sign-off stay with a person.

How it fits together

Dependability comes from the harness, not the prompt.

Outputs are tested, guarded, evaluated and traceable, so the work can be trusted where it matters most.

Design principles

Six principles for AI you can depend on.

01

A verifier with no shared incentive

A separate check reads what the AI actually did, not what it claimed, and can stop it.

02

Bounded autonomy

Caps on iterations, scope and spend, so a stuck system fails safely instead of spinning.

03

Deterministic handling of the predictable

The parts that must never vary are coded, not generated, so they behave the same every time.

04

Context kept clean

Long, noisy context is compacted, so the AI does not lose the instruction that matters.

05

Escalation by design

Anything beyond threshold or below confidence routes to a named reviewer automatically.

06

A person at the decision

Automation handles the volume; judgement and sign-off remain human.

What clients can learn

Firms working with us through Thunderpoint get an honest read on where their AI can and cannot be trusted, and what harness it would take to make it dependable in real operations. It is the same harness we build into every deployment.

Where this is heading

The winners will not have the cleverest models. They will have the most dependable systems.

As more work is handed to AI, dependability becomes the constraint on value, not raw capability. Most agentic projects that fail will fail on reliability and control, not intelligence. Engineering the harness is how that gap closes.

Related research and outputs

Project Aperture

Adaptive interfaces that compose the right view for each decision.

Project Horizon

Model flexibility, integration and the architecture of safe deployment.

Project Lighthouse

Leadership, adoption and how teams come to trust AI-supported work.

How we publish

What we share, and what we keep.

Project Thunderpoint is open research into reliability: the harness of guardrails, verification and human checkpoints around the model. We publish the questions we are wrestling with, the patterns that recur across real deployments, and the principles that hold up under pressure. We do not publish client data, or the parts of the method still being worked out. It is research, not a product: the questions are the interesting part, and the answers are earned in the work.

Project Thunderpoint™

What we share, and what we keep.

Following the reliability research?