DETERMINISTIC CHECKS FOR AI OUTPUT · BRING YOUR OWN MODEL

A resistance machine
for your AI.

Plug it into your pipeline the way you would add a human in the loop to vet output before it ships, except this reviewer is deterministic: it returns one verdict and repeats it on every run.

A model optimizes for the next plausible token, not the verifiable one. So it pads, drifts, agrees too easily, contradicts its own source, and miscounts, without noticing. It cannot catch this by checking itself, because self-review runs the same engine with the same blind spots. The check has to come from outside the loop.

And human review cannot keep up. AI output ships at a volume no team can read, and the new slop hides inside it: adequate and invisible, clean enough that neither a busy reviewer nor the model that wrote it catches the tells. You cannot hire your way out of that volume. The check that keeps up runs on every output, deterministically, and never tires.

doloop is that check. It runs a purpose-built test for each kind of output: prose, tables, charts, conversations. It returns the specific problems, where each one is and the evidence for it, the same findings every run. The verdict replays byte for byte, so you can audit the checker rather than trust it.

Connect doloop → Defensive engineering →

Start with code, where the stakes are highest

AI writes code that looks right and ships broken. The code donkey is a deterministic check that reads your whole codebase, works out the conventions it already holds, and blocks a commit only where a change breaks one, with the line and the rule. The same code in, the same verdict out, every run.

That determinism is the point for a regulated team: a verdict you can re-run and audit is a deterministic, rule-based process, outside the revised "model" definition in SR 26-2. It puts a check in front of your agent that a model-risk reviewer can actually sign off on.

See the code donkey live → Model risk · SR 26-2 →

How it works

An AI cannot reliably check its own work, because it shares its own blind spots. The judge has to sit outside the model: something that reads every answer and refuses to pass it until it holds up.

you bring
your agentyour AI lives here
draft →
← “redo line 3”
we bring · the doloop machine
the donkeys are our checkssame input, same answer, every time
verdict ↓    ↑ learned rules
your memoryyour house rules are the conventions it applies every call.
→ pass
ship

Your AI lives inside your agent; we never touch the raw model. Your agent hands its work to our machine, the donkeys plus the memory, and we run the output through the loop until it passes.

Have your agent talk to our donkeys.

Your AI sits behind your own agent; your agent calls our donkeys, and we work your AI out until the output passes. The difference: ours is deterministic, and it returns the same verdict however your AI phrases the output.

THE CHECK

One verdict per input

doloop checks the AI's output with fixed rules. The mechanical lenses give one result for a given input and reproduce it exactly when you re-run. Where a test must read for meaning, a cheap AI check on a pinned prompt gives an advisory reading instead. Either way it points at the exact line that is wrong, and it never writes your output. We call these checks donkeys: small, single-purpose tests that run against your AI.

THE LOOP

It makes the AI fix it

When the check fails, doloop hands the exact problem back to your AI and asks for a fix, then checks again, and again, until the answer passes. You get a result that has been worked over, not a first draft.

YOUR AI

You bring the intelligence

Your AI does the smart part. We add the one thing it cannot do for itself: stand outside its own work and tell it when it is wrong. You bring the model; we bring the adversary.

And it compounds

Where your work recurs, doloop reuses what it learned. The Extraction machine does this today: a vendor's table layout becomes a saved template, so the next document is faster, cheaper, and more certain. That memory is yours, owned and exportable, and it is coming to the other machines.

Use doloop when…

Each one is a single call to POST /v1/check, and the verdict is the same every time.

Your agent loops

A model in a loop can't see its own loop. doloop catches the repetition from outside.

Output drifts or slides back

It regresses to a bug it fixed a moment ago, or wanders off task. doloop flags the drift.

The model pads with slop

Hedges, jargon, dead cadence, self-management tics. Each tell flagged with a line.

A number won't tie out

A figure that isn't in the source, a total that won't reconcile. The documents donkey catches it.

You need it reproducible

Same input, same hash, same verdict, every run. An audit artifact, not an opinion.

You want human triage

Gate on the verdict and route only the failures to a person, not every line.

Connect the doloop machine to your pipeline

The machine is doloop's loop and memory wrapped around the checks. Send any answer your AI produced to it, the donkeys run against the text, and a verdict comes back, identical on every call. We store nothing and never touch your model.

  # send any AI output to the machine, get a deterministic verdict back
  curl https://api.doloop.io/v1/check \
    -H 'content-type: application/json' \
    -d '{"text": "the answer your model just produced"}'

  # -> {"verdict": "pass", "findings": [...], "input_sha256": "..."}
  # same text in, same verdict out, every time. live now.

Or from your terminal, as a gate in any pipeline:

  # install once
  pip install doloopio

  # exit 0 = pass, 2 = fail, so you can gate a publish on a clean verdict
  doloop check -f draft.md && publish draft.md
  doloop design https://your-site.com    # deterministic design review of a live page
  doloop loops                           # loops remaining on your key
ONE LINE

Wire it in, not around

Send your AI's output to the machine, or point your client at it as a drop-in. The donkeys run, the loop runs, and clean output comes back. No rip-and-replace, no new model to learn.

INTERPOSED, NOT INVOKED

A gate, not a tool

You don't hope your AI decides to check itself; it won't. doloop is wired into the loop as a gate the output has to pass, run against the model, not pulled in by it.

• NEVER LOCKED IN

Your account, your memory

Keep your own account with Anthropic, OpenAI, or Google, and switch models whenever you like. doloop never swaps your model. The house rules you build up stay yours, and leave with you if you go.

The check endpoint is live at api.doloop.io/v1/check. Run the call above right now. Want the full loop with your own tenant memory and an audit record? Talk to us.

“You can surely work with your own LLM, but you are captive to its whims and costs. Here is a way to get better product by design. Just connect to our MCPs or call our APIs, and we will build you an adversarial intelligence machine that helps your output get better.”

One machine, every kind of output

doloop is one machine: the loop and the memory. Point it at a kind of output and it runs the donkeys for that job. Same machine, same loop, different donkeys. Switch on the modes you use.

• LIVE · EXTRACTION

Extract this for me

Reconcile every figure against its source, and catch the numbers a model invents when it gets bored of reading. Its lead check, WYSIWYD, is live today: deterministic PDF table extraction, 100% reproducible across 90 extractions, 0 errors on 2,332 cells.

Open WYSIWYD →
• LIVE · WRITING

Write me this

Strip the slop. The Writing machine cuts hedging, filler, and the tells that mark text as machine written, and holds your house style as a rule the next draft has to pass. Checks: Deworm and Pebble.

Try the writing donkey →
• LIVE · CONVERSATIONS

Chat with this user

Stop the flattery and the repetition. The Conversations machine flags a model agreeing with everything, repeating itself, or crossing a safety line. Checks: Phaedrus plus the conversation and safety diagnostics.

Try the conversations donkey →
• LIVE · PRESENTATIONS

Show me this

Land the finding. The Presentations machine reviews a chart substance first, then style: does it carry a real finding, and does it land it without chartjunk. Lead donkey: Inkwell, vision-based and live; a slide-level check is next.

Try the presentations donkey →

Machine-readable catalog: /api/v1/machines/ · /api/v1/tools/ · openapi.json

Why an adversary, why now

Two documented trends point the same way. Models are getting more capable, and capability comes with more shortcutting, not less. Tokens are getting cheaper, so teams generate more output, which means more places for a shortcut to hide. The need for an external, deterministic check widens on both axes at once.

THE WEDGE

Bigger models shortcut more

Per the research, larger models are more likely to exploit shortcuts during inference. A better model will not fix this, because it cannot grade its own work. Frontier progress expands the need for a deterministic adversary rather than shrinking it.

THE MOAT

Determinism where it can be had

The verdict is byte-identical for byte-identical input where determinism can be had: the mechanical lenses. Where a check has to read for meaning, the linguistic layer is a pinned, advisory reader, and the slide check reads the image with a vision model, so those are bounded rather than byte-deterministic. The tokenizer, the baseline rules, the loop spec: open and versioned, so anyone can run them and check the result. The only private things are your memory and your data.

THE BILL

The bill is an audit artifact

Because the verdict is deterministic, so is the price: the amount is a function of the work, not a black box you have to trust. For a regulated buyer, that makes the invoice itself something you can audit.

Outside the model definition

On April 17, 2026, the Federal Reserve, OCC, and FDIC issued SR 26-2 (guidance PDF), replacing SR 11-7 after 15 years. The new framework explicitly excludes deterministic rule-based processes and software from the definition of a "model," and so from the full model-validation burden.

For CFOs and model-risk teams: a deterministic rule-based check, with no statistical model underneath, is not a "model" by this definition, so it does not carry the same model-validation expectations. SR 26-2 is non-binding guidance, not a safe harbor, and the model you bring stays in scope, so your counsel makes the final call. That is why the adversary has to be deterministic, not another model. The full framing, with primary sources →

How you pay

Honest and predictable, and entirely separate from what you spend on AI. We charge for the verification, nothing else.

BRING YOUR OWN LLM

We never touch your AI bill

You pay your model provider directly; we never sit in your token margin. As AI prices fall, your bill with us doesn't move.

PAY FOR VERIFIED OUTPUT

Not for the retries

You are billed for a clean, verified result, not for how many rounds it took to get there. Fewer rounds are cheaper for you and for us, so our incentives line up.

A BILL YOU CAN CHECK

Self-verifiable by design

The amount follows deterministically from how much you ran, computed by an open, versioned method you can run yourself. Reproduce your own invoice. No surprises, no trust required.

Provisioning, free trial, and tenant onboarding are rolling out now. Talk to us to stand up a loop.