Everything you need to know about Crystl's three extraction modes — process one document at a time, run 50 files overnight in a batch, or drop a bundled compliance PDF and let Auto-Split handle the rest.
Every document that enters Crystl passes through an extraction engine — a vision-language model that reads the file as a human would and pulls out the exact fields you need. There are three ways to trigger that process, each designed for a different situation. This guide walks through all of them: Extract (one document at a time), Batch (up to 50 files in one go), and Auto-Split(one bundled PDF that contains several documents). By the end you will know which mode to reach for, how the upload process works in each, and what every option actually does.
The Extract tab is the starting point. You upload a single file, configure a handful of options, click the button, and within seconds your document's data appears as structured fields — ready to copy, export, or act on.
Here is the full process from upload to result:
Drag your document onto the upload area or click to browse. Crystl accepts PDF, PNG, JPG, TIFF, and BMP files. Multi-page PDFs are fully supported — all pages are read and extracted together as one document.
The document type tells Crystl which fields to look for. When you pick Invoice, Crystl extracts invoice number, vendor, line items, totals, and due date. When you pick ID Document, it extracts name, ID number, date of birth, expiry, and nationality.
Crystl ships with a library of system document profiles — Invoice, Contract, ID Document, Bank Statement, Receipt, Medical Document, Form, and more. If your organisation has created custom document profiles, those appear at the top of the list.
Leaving the field blank triggers Auto-detect. Crystl runs a classification pass first to identify what kind of document you have, then extracts accordingly. Auto-detect is convenient but it costs one additional page from your monthly quota per file, and classification is slightly less precise than telling Crystl the type upfront.
Rule of thumb: if you know the document type, pick it. Reserve Auto-detect for ad-hoc one-offs where you genuinely are not sure.
The instructions field lets you guide the extraction in plain English. Examples of what you can write here:
Instructions are passed directly to the engine along with your document. They do not change which document profile is used — they give the model additional context on top of the document profile's standard field list.
Click Extract Document. The result panel shows every extracted field alongside a confidence score — a percentage that tells you how certain the model is about that value. Fields at 90 %+ are shown in green; 60–90 % in amber; below 60 % in red as a signal for manual review.
Underneath the fields you will also see the total processing time (typically 2–10 seconds) and how many pages were consumed from your quota.
Export the results as Excel, Word, or Markdown using the buttons at the top of the results panel. If something looks wrong, click Report an Issue — your feedback goes directly to the Crystl support team.
Crystl offers two AI engines. Your organisation admin sets a default; if the Allow provider override setting is enabled, you can switch per-extraction.
In practice: use Fast for anything that follows a predictable structure — invoices, receipts, standard forms, ID documents. Switch to Moderate when you are dealing with multi-page contracts, complex data tables, dense financial statements, or documents with handwritten annotations.
Batch is Extract at scale. Instead of uploading one file and waiting, you drop up to 50 files at once, configure them as a group (with per-file overrides if needed), submit the job, and watch a live progress feed as Crystl works through the queue in the background.
Switch to the Batch tab and drop your files onto the upload area. You can mix any combination of PDFs, images, and formats — up to 50 files per job, with a page-per-document limit depending on your plan.
After upload, each file appears in the list with its own document-type selector. You have three options:
Once you click Extract N files, the job is submitted and processing starts immediately. The progress panel updates in real time via a WebSocket connection — you see each file tick from Queued to Processing to Success (or Failed, with an error message). The overall progress bar shows completed versus total, colour-coded green when all succeeds, amber if some files failed.
Each completed file offers three actions:
Auto-Split solves a problem that comes up constantly in KYC and compliance workflows: a client scans several different documents into a single PDF file. One PDF. Four document types. Crystl detects where each one begins and ends, extracts each segment with the right document profile, and returns them all in a single structured result.
If you know the PDF contains only one document type, use Extract or Batch instead — Auto-Split is overkill and costs more pages (see below).
Auto-Split runs two AI passes on your document. Understanding this is important because it directly affects your page quota:
Pass 1 — Classification. Crystl converts the PDF to page images and sends each page to the Fast engine in sequence. Each page is classified in context of the previous one so the model can detect "this is a continuation" versus "this is a new document starting." The result is a list of boundary decisions and a grouping of pages into segments.
Pass 2 — Extraction. For each identified segment, Crystl extracts structured fields using the auto-detected document profile and the engine you selected (Fast or Moderate). Pages within a segment run concurrently to keep total time down.
A 10-page PDF costs 20 pages of quota — 10 for the classification pass and 10 for the extraction pass. This is expected and intentional: two AI calls per page means twice the accuracy on boundary detection. Keep it in mind when processing large bundles.
The result panel lists each detected segment as a collapsible card showing the document type, page range, confidence score, and all extracted fields. You can:
Every plan includes a monthly page quota. One page of a document = one page consumed. Here is how each mode counts:
| Mode | Pages consumed | Auto-detect surcharge |
|---|---|---|
| Extract | 1 page per document page | +1 page per file |
| Batch | 1 page per document page, per file | +1 page per auto-detect file |
| Auto-Split | 2 pages per document page (2 passes) | Included — all pages are classified |
Org admins receive email alerts at 80 % and 100 % of the monthly limit. You can check current usage at any time from the organisation settings page.
Auto-detect is a convenience, not a replacement for an explicit type. When you process a batch of 50 invoices, set them all to Invoice upfront — you save 50 quota pages and the extraction is faster and more accurate.
A 2-second extraction is fine for a standard invoice. A dense 30-page contract with complex clauses and nested tables is worth the extra 10 seconds that Moderate takes. Mixing is fine — Batch lets you set different engines per file if needed (contact support if you need per-file engine override enabled for your org).
An extracted value can look right and still be wrong. A field showing 62 % confidence on a critical financial amount deserves a second look. Build a review step into your workflow for any field below 80 %.
If the wrong document type was applied to a file — in a batch or in Auto-Split — use the Re-extract button. It fetches the original file from storage and re-runs extraction with your corrected document profile. You do not pay for the pages again.
A high-resolution scan at 300 DPI will extract better than a blurry phone photo, regardless of file size. If you are getting low confidence scores consistently on a particular source, the scan quality is usually the first thing to investigate.
| Symptom | Likely cause | What to try |
|---|---|---|
| Fields come back empty | Wrong document type selected | Re-extract with the correct type or Auto-detect |
| Low confidence across all fields | Poor scan quality | Re-scan at higher resolution; try Moderate engine |
| Auto-Split merges two documents | Similar document types on adjacent pages | Re-extract that segment with the correct type manually |
| Batch file shows "Failed" | Corrupted file, encrypted PDF, or engine timeout | Check the error message; re-upload a clean copy |
| Quota exceeded mid-batch | Monthly page limit reached | Wait for the reset date or upgrade your plan |
| Numbers are slightly off | Ambiguous layout or blurred digits | Switch to Moderate engine; report the issue |
Extract, Batch, and Auto-Split are three tools that cover the full range of document workloads — from one-off lookups to nightly processing runs. Pick the mode that fits your workflow, set the right document type, and let the engines do the rest.
Ready to try it?
Process your first 100 pages free. No credit card required.
Get Started FreeStart extracting documents for free. No credit card required.