Plug it into your pipeline the way you would add a human in the loop to vet output before it ships, except this reviewer is deterministic: it returns one verdict and repeats it on every run.
A model optimizes for the next plausible token, not the verifiable one. So it pads, drifts, agrees too easily, contradicts its own source, and miscounts, without noticing. It cannot catch this by checking itself, because self-review runs the same engine with the same blind spots. The check has to come from outside the loop.
And human review cannot keep up. AI output ships at a volume no team can read, and the new slop hides inside it: adequate and invisible, clean enough that neither a busy reviewer nor the model that wrote it catches the tells. You cannot hire your way out of that volume. The check that keeps up runs on every output, deterministically, and never tires.
doloop is that check. It runs a purpose-built test for each kind of output: prose, tables, charts, conversations. It returns the specific problems, where each one is and the evidence for it, the same findings every run. The verdict replays byte for byte, so you can audit the checker rather than trust it.
AI writes code that looks right and ships broken. The code donkey is a deterministic check that reads your whole codebase, works out the conventions it already holds, and blocks a commit only where a change breaks one, with the line and the rule. The same code in, the same verdict out, every run.
That determinism is the point for a regulated team: a verdict you can re-run and audit is a deterministic, rule-based process, outside the revised "model" definition in SR 26-2. It puts a check in front of your agent that a model-risk reviewer can actually sign off on.
An AI cannot reliably check its own work, because it shares its own blind spots. The judge has to sit outside the model: something that reads every answer and refuses to pass it until it holds up.
Your AI lives inside your agent; we never touch the raw model. Your agent hands its work to our machine, the donkeys plus the memory, and we run the output through the loop until it passes.
Have your agent talk to our donkeys.
Your AI sits behind your own agent; your agent calls our donkeys, and we work your AI out until the output passes. The difference: ours is deterministic, and it returns the same verdict however your AI phrases the output.
doloop checks the AI's output with fixed rules. The mechanical lenses give one result for a given input and reproduce it exactly when you re-run. Where a test must read for meaning, a cheap AI check on a pinned prompt gives an advisory reading instead. Either way it points at the exact line that is wrong, and it never writes your output. We call these checks donkeys: small, single-purpose tests that run against your AI.
When the check fails, doloop hands the exact problem back to your AI and asks for a fix, then checks again, and again, until the answer passes. You get a result that has been worked over, not a first draft.
Your AI does the smart part. We add the one thing it cannot do for itself: stand outside its own work and tell it when it is wrong. You bring the model; we bring the adversary.
Where your work recurs, doloop reuses what it learned. The Extraction machine does this today: a vendor's table layout becomes a saved template, so the next document is faster, cheaper, and more certain. That memory is yours, owned and exportable, and it is coming to the other machines.
Each one is a single call to POST /v1/check, and the verdict is the same every time.
A model in a loop can't see its own loop. doloop catches the repetition from outside.
It regresses to a bug it fixed a moment ago, or wanders off task. doloop flags the drift.
Hedges, jargon, dead cadence, self-management tics. Each tell flagged with a line.
A figure that isn't in the source, a total that won't reconcile. The documents donkey catches it.
Same input, same hash, same verdict, every run. An audit artifact, not an opinion.
Gate on the verdict and route only the failures to a person, not every line.
The machine is doloop's loop and memory wrapped around the checks. Send any answer your AI produced to it, the donkeys run against the text, and a verdict comes back, identical on every call. We store nothing and never touch your model.
# send any AI output to the machine, get a deterministic verdict back
curl https://api.doloop.io/v1/check \
-H 'content-type: application/json' \
-d '{"text": "the answer your model just produced"}'
# -> {"verdict": "pass", "findings": [...], "input_sha256": "..."}
# same text in, same verdict out, every time. live now.
Or from your terminal, as a gate in any pipeline:
# install once pip install doloopio # exit 0 = pass, 2 = fail, so you can gate a publish on a clean verdict doloop check -f draft.md && publish draft.md doloop design https://your-site.com # deterministic design review of a live page doloop loops # loops remaining on your key
Send your AI's output to the machine, or point your client at it as a drop-in. The donkeys run, the loop runs, and clean output comes back. No rip-and-replace, no new model to learn.
You don't hope your AI decides to check itself; it won't. doloop is wired into the loop as a gate the output has to pass, run against the model, not pulled in by it.
Keep your own account with Anthropic, OpenAI, or Google, and switch models whenever you like. doloop never swaps your model. The house rules you build up stay yours, and leave with you if you go.
The check endpoint is live at api.doloop.io/v1/check. Run the call above right now. Want the full loop with your own tenant memory and an audit record? Talk to us.
“You can surely work with your own LLM, but you are captive to its whims and costs. Here is a way to get better product by design. Just connect to our MCPs or call our APIs, and we will build you an adversarial intelligence machine that helps your output get better.”
doloop is one machine: the loop and the memory. Point it at a kind of output and it runs the donkeys for that job. Same machine, same loop, different donkeys. Switch on the modes you use.
Reconcile every figure against its source, and catch the numbers a model invents when it gets bored of reading. Its lead check, WYSIWYD, is live today: deterministic PDF table extraction, 100% reproducible across 90 extractions, 0 errors on 2,332 cells.
Open WYSIWYD →Strip the slop. The Writing machine cuts hedging, filler, and the tells that mark text as machine written, and holds your house style as a rule the next draft has to pass. Checks: Deworm and Pebble.
Try the writing donkey →Stop the flattery and the repetition. The Conversations machine flags a model agreeing with everything, repeating itself, or crossing a safety line. Checks: Phaedrus plus the conversation and safety diagnostics.
Try the conversations donkey →Land the finding. The Presentations machine reviews a chart substance first, then style: does it carry a real finding, and does it land it without chartjunk. Lead donkey: Inkwell, vision-based and live; a slide-level check is next.
Try the presentations donkey →Machine-readable catalog: /api/v1/machines/ · /api/v1/tools/ · openapi.json
Two documented trends point the same way. Models are getting more capable, and capability comes with more shortcutting, not less. Tokens are getting cheaper, so teams generate more output, which means more places for a shortcut to hide. The need for an external, deterministic check widens on both axes at once.
Per the research, larger models are more likely to exploit shortcuts during inference. A better model will not fix this, because it cannot grade its own work. Frontier progress expands the need for a deterministic adversary rather than shrinking it.
The verdict is byte-identical for byte-identical input where determinism can be had: the mechanical lenses. Where a check has to read for meaning, the linguistic layer is a pinned, advisory reader, and the slide check reads the image with a vision model, so those are bounded rather than byte-deterministic. The tokenizer, the baseline rules, the loop spec: open and versioned, so anyone can run them and check the result. The only private things are your memory and your data.
Because the verdict is deterministic, so is the price: the amount is a function of the work, not a black box you have to trust. For a regulated buyer, that makes the invoice itself something you can audit.
On April 17, 2026, the Federal Reserve, OCC, and FDIC issued SR 26-2 (guidance PDF), replacing SR 11-7 after 15 years. The new framework explicitly excludes deterministic rule-based processes and software from the definition of a "model," and so from the full model-validation burden.
For CFOs and model-risk teams: a deterministic rule-based check, with no statistical model underneath, is not a "model" by this definition, so it does not carry the same model-validation expectations. SR 26-2 is non-binding guidance, not a safe harbor, and the model you bring stays in scope, so your counsel makes the final call. That is why the adversary has to be deterministic, not another model. The full framing, with primary sources →
Honest and predictable, and entirely separate from what you spend on AI. We charge for the verification, nothing else.
You pay your model provider directly; we never sit in your token margin. As AI prices fall, your bill with us doesn't move.
You are billed for a clean, verified result, not for how many rounds it took to get there. Fewer rounds are cheaper for you and for us, so our incentives line up.
The amount follows deterministically from how much you ran, computed by an open, versioned method you can run yourself. Reproduce your own invoice. No surprises, no trust required.
Provisioning, free trial, and tenant onboarding are rolling out now. Talk to us to stand up a loop.