The doloop surface for pulling data out of documents. It runs a donkey (a deterministic check that returns the same result for the same input) that extracts the tables from a PDF and ties every cell back to the spot on the page it came from. Nothing is guessed, and you can prove the source of each number.
When an AI reads a document and gets bored, it fills the gaps. The donkey does not.
Ask a model to read a statement and it will confidently return figures that do not appear in the document. It fabricates to finish rather than admit it could not find them.
A total lands in the wrong row, a column shifts, a footnote merges into a value. The number is real but attached to the wrong thing, which is just as wrong.
Even a correct extraction is useless to an audit team if you cannot point at where it came from. No provenance means no sign-off.
WYSIWYD. Deterministic: the same PDF gives the same numbers, every time. Live today.
Why: a model will invent a number to finish the job; a script cannot.
How: it detects the table structure on the page and extracts each cell mechanically,
then ties every value back to its place on the page. 100% reproducible across 90 extractions, 0
errors on 2,332 cells. Try it on your own PDF.
An invoice in, a sourced table out. Each value carries its source.
$ check invoice.pdf
verdict: PASS cells: 162 sourced: 162 / 162
every value traced to a box on the page:
"Subtotal 1,240.00" page 1, box 84 ✓
"Tax 99.20" page 1, box 91 ✓
"Total 1,339.20" page 1, box 97 ✓ (= subtotal + tax)
run it again on the same PDF: byte-identical output.
Run it twice and the output matches byte for byte, every value traced to a box on the page.
Call the donkey on a file, or run the surface inside the doloop machine: the service that wraps your own AI, runs the check on every output, and keeps state across runs. The difference is state.
Send a PDF, get the sourced table back. Stateless and simple: one file in, one sourced table out. Try it free right now, no sign-up. Nothing remembered.
Connect your own AI to the doloop machine in document mode and the donkey runs on every extraction. Your AI reads, the donkey ties every number to the page and rejects anything it cannot source, and only verified data ships. The machine learns your document templates.
Want this on your document pipeline? Talk to us, or see the other surfaces.
Routes you to a real page, asks when ambiguous, or refuses. No model on the answer path, so it never invents.