How Crystl Document Profiles replace brittle OCR templates with AI-driven extraction that handles any layout. Covers system vs custom profiles, AI generation, field types, the description formula that drives accuracy, barcode data capture, and a KYC onboarding recipe.
Operations teams drowning in PDFs — invoices, ID documents, customs declarations, contracts — know the cost of brittle document automation: a supplier tweaks their layout and the extraction system silently returns wrong data, hours are lost on manual correction, and downstream systems carry bad records. Crystl document profiles are the fix. Each profile is a reusable AI extraction blueprint that tells Crystl which fields to find, what format to expect, and how to recognise them — on any layout variant, with no code. This guide covers everything: system vs custom profiles, AI generation, manual creation, field types, barcode data capture, and how to fix a field the AI missed.
The complete Document Profile workflow — from the OCR template problem through AI field detection, description quality, and live extraction. Click to play.
Traditional document automation relies on OCR templates — configuration files that map each field to exact page coordinates, bounding boxes, or regex patterns. They work when every document comes from the same printer, in the same font, on the same paper size. The moment anything shifts, they break silently.
Here is a common real-world failure: a logistics team uses coordinate-based OCR to read supplier invoices into their ERP. One supplier redesigns their invoice — moves the total value field two columns to the right. The OCR still reads a number from the old coordinates, but it is now capturing the unit price instead. The ERP books the wrong amount. The discrepancy surfaces three weeks later in a reconciliation and costs a day of manual unwinding. No alert. No error. Just a quiet, expensive mistake.
Document Profiles work differently. Instead of coordinates, a profile describes the field in plain English: "the declared customs value — a ZAR amount printed next to the label 'Total FOB Value' or 'Customs Value', usually in the bottom third of the form". The AI reads that description and finds the correct field on any invoice variant — no coordinates, no regex, no developer required.
Before
OCR Template
Rigid coordinate grids
Breaks when layout shifts even 1 mm
Regex pattern matching
Fails on fonts, scans, or handwriting
Developer-only changes
Field edits need a code deployment
70–85 % accuracy ceiling
Constant manual correction required
One config per exact format
New supplier = new template from scratch
Average accuracy
Now
Document Profile
Semantic field understanding
AI finds the field regardless of layout
Plain-language descriptions
Write once in English — AI does the rest
No-code by anyone
Operations team edits live in the browser
95 %+ accuracy out of the box
Improves as descriptions get sharper
One profile, infinite variants
Handles all invoice layouts with one profile
Average accuracy
The core difference: OCR templates tell the machine where to look. Document Profiles tell the AI what to find — and the AI figures out the where on every single document, regardless of layout variation.
Figure 1 — OCR template vs Document Profile: accuracy, flexibility, and who can maintain it
What this means in practice: teams switching from coordinate-based OCR to Crystl document profiles typically eliminate 80–90 % of their manual correction workload within the first month. Fewer wrong values reaching downstream systems means fewer payment disputes, faster customs clearance, and smoother client onboarding — because the data your staff act on is correct the first time.
Crystl maintains two libraries that merge together whenever you pick a document type.
Figure 2 — System profiles (read-only, Crystl-maintained) vs Custom profiles (your organisation, fully editable)
The Document Profile list view shows system profiles (locked) alongside your organisation's custom profiles — both are available as document types across all extraction modes.
System document profiles are built and maintained by the Crystl team. They cover the most common document types — passports, ID cards, invoices, bank statements, contracts, payslips, receipts, and more. You cannot edit or delete them; they work out of the box and are updated as document formats evolve.
Custom document profiles are created by your organisation for document types specific to your industry or workflow: a customs declaration form, a non-standard purchase order, a franchise inspection checklist. They live only inside your account and are fully editable at any time.
The fastest way to build a custom document profile is to upload a real example and let the AI do the initial work. Navigate to Document Profiles → Create with AI.
Figure 3 — "Create with AI" flow: upload sample → AI scans and proposes fields → review and save
The "Create with AI" screen shows your document on the left and the AI-detected fields on the right — each with a suggested name, type, and description ready to review.
Drop a filled-in form, a real supplier invoice, or a scanned ID — PDFs up to 10 pages or a single image. Give the profile a name that describes the document type (e.g. "Customs Declaration Form DA 65").
Within seconds, Crystl presents a side-by-side view: document on the left, detected fields on the right. This review step largely determines your long-term extraction accuracy. Fix unclear names, sharpen vague descriptions (see the descriptions section below), add any fields the AI missed, and remove ones you do not need. Five minutes here prevents weeks of manual corrections downstream.
Click Save. The profile is instantly available as a document type in Extract, Batch, and Auto-Split, and as a linkable slot type in Cases & Packages.
Prefer to define every field from scratch — or don't have a sample document yet? Use Document Profiles → New Document Profile for the full three-step wizard.
Figure 4 — Manual wizard: Identity → Fields (name, type, description, validation) → Review & prompt
Set the profile's name and an internal ID (snake_case, e.g. customs_declaration). Write a short description explaining what this profile is for — this is also used by the AI classifier to recognise documents of this type automatically. Barcode scanning is also configured here (see Barcode Data Capture below).
Add each field the profile should extract. For every field you define:
declared_value)DD MMM YYYY)The wizard auto-generates an extraction prompt from your field definitions. Edit it directly to add specific instructions if needed (e.g. "always return amounts in ZAR even if the document shows a foreign currency"), then save.
The description you write for a field is the single most important factor in AI document extraction accuracy. It is not a label for you — it is the instruction Crystl reads every time it processes that document type. Vague descriptions produce vague, inconsistent results.
Figure 5 — Vague vs specific field description: 41 % vs 97 % extraction confidence on the same field
Use this pattern for every description you write:
Label + Appearance + Location + Exceptions
The good description in Figure 5 follows this pattern exactly: it names the printed label, states the 6-digit format, notes the table location, and flags the 8-digit exception. That specificity is why it returns 97 % confidence versus 41 % for the vague version. Apply this pattern equally to AI-generated profiles (edit the AI's suggestions during the review step) and manually created ones.
Every field has a type that tells both the AI and the output validator what format to expect. Crystl supports six types:
Figure 6 — The six field types: Text, Number, Date, Yes/No, List, and Group
| Type | When to use | Tips |
|---|---|---|
| Text | Names, IDs, addresses, reference codes, free text | The default for anything that doesn't fit another type |
| Number | Monetary amounts, quantities, page counts | Use the Format field to specify currency or decimal precision |
| Date | Issue date, expiry date, date of birth, transaction dates | Always set a Format (e.g. YYYY-MM-DD) for consistent output |
| Yes / No | Signed / unsigned, certified / not certified, presence flags | Returns true or false in the output |
| List | Invoice line items, transaction rows, multiple signatories | Describe the structure of each item in the field description |
| Group | Billing address, registered office, structured contact block | List the sub-fields in the description (street, city, postal code) |
A field that is empty or absent from results usually means one of three things: it is not in the profile at all, the description is too vague for the AI to find it reliably, or it genuinely does not appear on that document. For the first two cases the fix takes under two minutes:
You cannot edit system document profiles. If a Crystl-provided profile is missing a field, create a custom document profile for that document type — it will take precedence over the system profile for your organisation.
Many documents encode authoritative data in a machine-readable barcode that is faster and more accurate to read than printed text. South African Smart ID cards carry a PDF417 barcode on the back with the identity number, date of birth, sex, and nationality. Warehouse goods labels use Code 39 or Data Matrix barcodes for SKU and batch tracking. Laboratory sample tubes carry QR codes with patient or specimen IDs. Clinic wristbands store visit numbers in Code 39. In all these cases, reading the barcode directly is both faster and more reliable than vision-only extraction.
When barcode scanning is enabled on a document profile, Crystl runs the barcode scanner and the vision AI in parallel, then merges the results according to the priority you set — making ID verification automation and barcode data capture part of the same extraction workflow, with no separate pipeline.
Figure 7 — Barcode data capture: parallel scanner + vision AI, merged by priority setting
The barcode configuration panel shows symbology selection and merge priority options, both accessible from the document profile's Identity step or the Edit page.
Barcode settings live on the document profile's Identity step (or the Edit page for existing profiles):
Passports carry no standard barcode, but they include a Machine Readable Zone (MRZ) — two rows of machine-formatted characters at the bottom of the biographical page. When MRZ parsing is enabled on the passport document profile, the vision model reads those lines verbatim and the backend automatically parses them into structured fields: document number, nationality, date of birth, expiry date, and sex. No barcode scanner needed.
Document Profiles are the connective tissue of Crystl's document intelligence platform — every extraction mode and every workflow step references them:
For teams running KYC or client onboarding workflows, a proven pattern is to create a Package with three slots: Passport (linked to the system Passport profile with MRZ parsing on), Proof of Address (linked to a custom Bank Statement or Utility Bill profile), and Company Registration (a custom profile with director names as a List field and registration number as a validated Text field). When the client uploads all three, Crystl extracts and structures the data automatically — so your team can spot a name discrepancy between the passport and the bank statement before a human even opens the case.
Build your first Crystl document profile from a real invoice or ID in under 5 minutes: go to Document Profiles → Create with AI, upload the document, review the detected fields, and save. From that point forward, every document of that type is processed automatically. Ready to go further? See how Auto-Split handles multi-document PDFs, or explore how Cases & Packages turns your profiles into a complete client onboarding workflow.
Ready to try it?
Process your first 100 pages free. No credit card required.
Get Started FreeStart extracting documents for free. No credit card required.