All posts
TutorialMarch 18, 2026

Document Profiles: How Crystl Learns to Read Your Documents

How Crystl Document Profiles replace brittle OCR templates with AI-driven extraction that handles any layout. Covers system vs custom profiles, AI generation, field types, the description formula that drives accuracy, barcode data capture, and a KYC onboarding recipe.

Crystl Team
March 18, 2026
9 min read

Operations teams drowning in PDFs — invoices, ID documents, customs declarations, contracts — know the cost of brittle document automation: a supplier tweaks their layout and the extraction system silently returns wrong data, hours are lost on manual correction, and downstream systems carry bad records. Crystl document profiles are the fix. Each profile is a reusable AI extraction blueprint that tells Crystl which fields to find, what format to expect, and how to recognise them — on any layout variant, with no code. This guide covers everything: system vs custom profiles, AI generation, manual creation, field types, barcode data capture, and how to fix a field the AI missed.

Document ProfilesAI extraction blueprint
Live extraction
Document In
📄
Invoice_2024_03.pdf
2 pages · 148 KB
Profile matched
Invoice · Custom
accuracy
96 %
Document Profile
Invoice
id: invoice_custom · 6 fields
Custom
invoice_number
Text
Invoice reference number printed at top right, e.g. "INV-2024-0891"
vendor_name
Text
Full legal name of the issuing vendor, as printed in the header block
issue_date
Date
Document date in the top section, format YYYY-MM-DD
total_amount
Number
Grand total incl. tax, next to "Total Due" label, numeric only
currency
Text
3-letter ISO currency code, e.g. ZAR, USD — beside the total
line_items
List
Each row in the items table: description, qty, unit price, line total
Structured Output
invoice_numberINV-2024-0891
vendor_nameAcme Imports Ltd
issue_date2024-03-15
total_amount12,450.00
currencyZAR
line_items[ 4 rows ]
Confidence96 %
Powers →
Extract
Batch
Auto-Split
Cases
Exports
Crystldocument intelligence
0:00 / 0:10

The complete Document Profile workflow — from the OCR template problem through AI field detection, description quality, and live extraction. Click to play.

Why Not Just Use an OCR Template?

Traditional document automation relies on OCR templates — configuration files that map each field to exact page coordinates, bounding boxes, or regex patterns. They work when every document comes from the same printer, in the same font, on the same paper size. The moment anything shifts, they break silently.

Here is a common real-world failure: a logistics team uses coordinate-based OCR to read supplier invoices into their ERP. One supplier redesigns their invoice — moves the total value field two columns to the right. The OCR still reads a number from the old coordinates, but it is now capturing the unit price instead. The ERP books the wrong amount. The discrepancy surfaces three weeks later in a reconciliation and costs a day of manual unwinding. No alert. No error. Just a quiet, expensive mistake.

Document Profiles work differently. Instead of coordinates, a profile describes the field in plain English: "the declared customs value — a ZAR amount printed next to the label 'Total FOB Value' or 'Customs Value', usually in the bottom third of the form". The AI reads that description and finds the correct field on any invoice variant — no coordinates, no regex, no developer required.

🗂️

Before

OCR Template

Legacy
⚠️

Rigid coordinate grids

Breaks when layout shifts even 1 mm

⚠️

Regex pattern matching

Fails on fonts, scans, or handwriting

⚠️

Developer-only changes

Field edits need a code deployment

⚠️

70–85 % accuracy ceiling

Constant manual correction required

⚠️

One config per exact format

New supplier = new template from scratch

📉

Average accuracy

~77 %
VS

Now

Document Profile

Crystl

Semantic field understanding

AI finds the field regardless of layout

Plain-language descriptions

Write once in English — AI does the rest

No-code by anyone

Operations team edits live in the browser

95 %+ accuracy out of the box

Improves as descriptions get sharper

One profile, infinite variants

Handles all invoice layouts with one profile

📈

Average accuracy

96 %+
💡

The core difference: OCR templates tell the machine where to look. Document Profiles tell the AI what to find — and the AI figures out the where on every single document, regardless of layout variation.

Figure 1 — OCR template vs Document Profile: accuracy, flexibility, and who can maintain it

What this means in practice: teams switching from coordinate-based OCR to Crystl document profiles typically eliminate 80–90 % of their manual correction workload within the first month. Fewer wrong values reaching downstream systems means fewer payment disputes, faster customs clearance, and smoother client onboarding — because the data your staff act on is correct the first time.


System Document Profiles vs Custom Document Profiles

Crystl maintains two libraries that merge together whenever you pick a document type.

System Document Profiles
Provided by Crystl
Available to every organisation
🔒
PassportRead-only
Smart ID CardRead-only
InvoiceRead-only
Bank StatementRead-only
ContractRead-only
Medical DocumentRead-only
ReceiptRead-only
PayslipRead-only
Maintained by Crystl · always up to date
Custom Document Profiles
Created by your team
Visible only to your organisation
✏️
Create with AI
Upload a sample — AI detects all fields
🛠
Build manually
Define every field yourself
Edit anytime
Add fields, fix descriptions, update prompts
🗑
Delete when done
Full control over your library
Available on Professional plan and above

Figure 2 — System profiles (read-only, Crystl-maintained) vs Custom profiles (your organisation, fully editable)

The Document Profile list view shows system profiles (locked) alongside your organisation's custom profiles — both are available as document types across all extraction modes.

System document profiles are built and maintained by the Crystl team. They cover the most common document types — passports, ID cards, invoices, bank statements, contracts, payslips, receipts, and more. You cannot edit or delete them; they work out of the box and are updated as document formats evolve.

Custom document profiles are created by your organisation for document types specific to your industry or workflow: a customs declaration form, a non-standard purchase order, a franchise inspection checklist. They live only inside your account and are fully editable at any time.


Create a Document Profile with AI (fastest path)

The fastest way to build a custom document profile is to upload a real example and let the AI do the initial work. Navigate to Document Profiles → Create with AI.

01Upload sample
📄Drop a real document herePDF or image
Document Profile name
Customs Declaration
02AI detects fields
Detected automatically
Importer NameText
HS CodeText
Declared ValueNumber
Country of OriginText
Declaration DateDate
03Review & save
Edit before saving
HS Code
Edit description ↓
"6-digit Harmonised System code printed above the commodity description"
+ Add field
Save

Figure 3 — "Create with AI" flow: upload sample → AI scans and proposes fields → review and save

The "Create with AI" screen shows your document on the left and the AI-detected fields on the right — each with a suggested name, type, and description ready to review.

Step 1 — Upload a sample document

Drop a filled-in form, a real supplier invoice, or a scanned ID — PDFs up to 10 pages or a single image. Give the profile a name that describes the document type (e.g. "Customs Declaration Form DA 65").

Step 2 — Review and sharpen detected fields

Within seconds, Crystl presents a side-by-side view: document on the left, detected fields on the right. This review step largely determines your long-term extraction accuracy. Fix unclear names, sharpen vague descriptions (see the descriptions section below), add any fields the AI missed, and remove ones you do not need. Five minutes here prevents weeks of manual corrections downstream.

Step 3 — Save and deploy

Click Save. The profile is instantly available as a document type in Extract, Batch, and Auto-Split, and as a linkable slot type in Cases & Packages.


Create a Document Profile Manually (maximum control)

Prefer to define every field from scratch — or don't have a sample document yet? Use Document Profiles → New Document Profile for the full three-step wizard.

1
Identity
2
Fields
3
Review
Document Fields
+ Add field
importer_name
TextRequired
"Full legal name of the importing entity"
hs_code
TextRequired
"6-digit Harmonised System tariff code"
declared_value
NumberRequired
"Declared customs value in local currency"
country_of_origin
TextOptional
"Country where goods were manufactured"
← Back
Next: Review →

Figure 4 — Manual wizard: Identity → Fields (name, type, description, validation) → Review & prompt

Step 1 — Identity

Set the profile's name and an internal ID (snake_case, e.g. customs_declaration). Write a short description explaining what this profile is for — this is also used by the AI classifier to recognise documents of this type automatically. Barcode scanning is also configured here (see Barcode Data Capture below).

Step 2 — Fields

Add each field the profile should extract. For every field you define:

  • Name — variable name in the output (e.g. declared_value)
  • Type — data format: Text, Number, Date, Yes/No, List, or Group
  • Required / Optional — whether missing values are flagged in results
  • Description — plain-language guidance for the AI (the most important setting — see below)
  • Example — a sample value showing expected format
  • Format — for dates and numbers, a format hint (e.g. DD MMM YYYY)
  • Validation — optional pattern the extracted value must match (e.g. 13-digit ID number)

Step 3 — Review & prompt

The wizard auto-generates an extraction prompt from your field definitions. Edit it directly to add specific instructions if needed (e.g. "always return amounts in ZAR even if the document shows a foreign currency"), then save.


Why Field Descriptions Drive Accuracy

The description you write for a field is the single most important factor in AI document extraction accuracy. It is not a label for you — it is the instruction Crystl reads every time it processes that document type. Vague descriptions produce vague, inconsistent results.

Vague
hs_code · description
"The code"
What the AI returns:
hs_codenull · 41%
Specific
hs_code · description
"6-digit Harmonised System tariff code printed above the commodity description line"
What the AI returns:
hs_code"8471.30" · 97%

Figure 5 — Vague vs specific field description: 41 % vs 97 % extraction confidence on the same field

Use this pattern for every description you write:

Label + Appearance + Location + Exceptions
  • Label — what the field is called on the document ("printed as 'HS Tariff Code' or 'Commodity Code'")
  • Appearance — format, length, characters ("a 6-digit numeric code, e.g. 847130")
  • Location — where on the page to find it ("in the goods description table, rightmost column")
  • Exceptions — edge cases ("may be 8 digits if a country subdivision code is appended")

The good description in Figure 5 follows this pattern exactly: it names the printed label, states the 6-digit format, notes the table location, and flags the 8-digit exception. That specificity is why it returns 97 % confidence versus 41 % for the vague version. Apply this pattern equally to AI-generated profiles (edit the AI's suggestions during the review step) and manually created ones.


Field Types

Every field has a type that tells both the AI and the output validator what format to expect. Crystl supports six types:

🔤string
Text
Names, IDs, addresses, codes
"INV-2025-001"
🔢number
Number
Amounts, quantities, totals
4250.00
📅date
Date
Issue dates, expiry, birth
"2025-01-15"
boolean
Yes / No
Signed, certified, present
true
📋array
List
Line items, tags, entries
[{…},{…}]
🗂object
Group
Addresses, nested records
{street,city}

Figure 6 — The six field types: Text, Number, Date, Yes/No, List, and Group

TypeWhen to useTips
TextNames, IDs, addresses, reference codes, free textThe default for anything that doesn't fit another type
NumberMonetary amounts, quantities, page countsUse the Format field to specify currency or decimal precision
DateIssue date, expiry date, date of birth, transaction datesAlways set a Format (e.g. YYYY-MM-DD) for consistent output
Yes / NoSigned / unsigned, certified / not certified, presence flagsReturns true or false in the output
ListInvoice line items, transaction rows, multiple signatoriesDescribe the structure of each item in the field description
GroupBilling address, registered office, structured contact blockList the sub-fields in the description (street, city, postal code)

Adding a Field the AI Missed

A field that is empty or absent from results usually means one of three things: it is not in the profile at all, the description is too vague for the AI to find it reliably, or it genuinely does not appear on that document. For the first two cases the fix takes under two minutes:

  1. Go to Document Profiles and open your custom profile
  2. Click Edit
  3. Add a missing field — click "+ Add field", set the name and type, and write a specific description using the Label + Appearance + Location + Exceptions pattern
  4. Improve a vague description — find the field and rewrite it to be precise about what to look for and where
  5. Save the document profile
  6. Re-extract — return to your result and click Re-extract. Crystl fetches the original file and re-runs at no extra quota cost
You cannot edit system document profiles. If a Crystl-provided profile is missing a field, create a custom document profile for that document type — it will take precedence over the system profile for your organisation.

Barcode Data Capture

Many documents encode authoritative data in a machine-readable barcode that is faster and more accurate to read than printed text. South African Smart ID cards carry a PDF417 barcode on the back with the identity number, date of birth, sex, and nationality. Warehouse goods labels use Code 39 or Data Matrix barcodes for SKU and batch tracking. Laboratory sample tubes carry QR codes with patient or specimen IDs. Clinic wristbands store visit numbers in Code 39. In all these cases, reading the barcode directly is both faster and more reliable than vision-only extraction.

When barcode scanning is enabled on a document profile, Crystl runs the barcode scanner and the vision AI in parallel, then merges the results according to the priority you set — making ID verification automation and barcode data capture part of the same extraction workflow, with no separate pipeline.

PDF417
Smart ID Card
Front + back · PDF417 barcode present
Barcode Scanner
ID Number
Date of Birth
Nationality
Sex
Vision AI
Surname
Names
ID Number
Country of Birth
merge_priority: "barcode" · barcode fields win on conflict
Merged Result
Surnamevlm
JOHNSON
ID Numberbarcode
8501015…
Date of Birthbarcode
1985-01-01
Countryvlm
South Africa

Figure 7 — Barcode data capture: parallel scanner + vision AI, merged by priority setting

The barcode configuration panel shows symbology selection and merge priority options, both accessible from the document profile's Identity step or the Edit page.

Configuring barcode scanning

Barcode settings live on the document profile's Identity step (or the Edit page for existing profiles):

  • Enable / disable — toggle scanning on or off for this document profile
  • Symbologies — leave blank to auto-detect, or specify one or more: PDF417 (government IDs, boarding passes), QR Code, Code 39 (warehouse labels, clinic wristbands), Data Matrix (lab samples, logistics). Specifying the exact type is faster.
  • Merge priority — when a field appears in both the barcode and the AI result:
    • Barcode priority — use the barcode value (recommended for government IDs and any document where the barcode is the authoritative source)
    • AI priority — use the vision model value (useful when the AI needs to reformat or validate a raw barcode payload)

MRZ parsing for passports and travel documents

Passports carry no standard barcode, but they include a Machine Readable Zone (MRZ) — two rows of machine-formatted characters at the bottom of the biographical page. When MRZ parsing is enabled on the passport document profile, the vision model reads those lines verbatim and the backend automatically parses them into structured fields: document number, nationality, date of birth, expiry date, and sex. No barcode scanner needed.


How Crystl Document Profiles Power the Whole Platform

Document Profiles are the connective tissue of Crystl's document intelligence platform — every extraction mode and every workflow step references them:

  • Extract / Batch / Auto-Split — select a profile by name, or use Auto-detect and Crystl matches the document to the closest profile using its description and field keywords.
  • Cases & Packages — link a specific document profile to each slot in a Package. When a client uploads to that slot, Crystl automatically extracts the data and surfaces the structured fields in the case detail view.
  • Exports — field names in your profile become column headers in Excel exports and section headings in Word and Markdown exports.

Recommended setup: KYC & client onboarding

For teams running KYC or client onboarding workflows, a proven pattern is to create a Package with three slots: Passport (linked to the system Passport profile with MRZ parsing on), Proof of Address (linked to a custom Bank Statement or Utility Bill profile), and Company Registration (a custom profile with director names as a List field and registration number as a validated Text field). When the client uploads all three, Crystl extracts and structures the data automatically — so your team can spot a name discrepancy between the passport and the bank statement before a human even opens the case.

Build your first Crystl document profile from a real invoice or ID in under 5 minutes: go to Document Profiles → Create with AI, upload the document, review the detected fields, and save. From that point forward, every document of that type is processed automatically. Ready to go further? See how Auto-Split handles multi-document PDFs, or explore how Cases & Packages turns your profiles into a complete client onboarding workflow.

Ready to try it?

Start extracting data from documents today

Process your first 100 pages free. No credit card required.

Get Started Free
Loved by document teams worldwide

Ready to see the crystal clear difference?

Start extracting documents for free. No credit card required.