Parze Documentation Audit
The docs have the bones of a clean developer API — an OpenAPI spec, an llms.txt, separate Limits/Errors/Pricing pages, and explicit notes about legacy aliases — but several pages contradict each other or themselves in ways that will burn both humans and coding agents.
1. Suggest-Schema and Text-to-Schema contradict themselves on whether auth is required (critical)
Location: https://docs.parze.ai/api-reference/suggest-schema and https://docs.parze.ai/api-reference/text-to-schema
Problem: The body of both pages says "Authentication is optional for this endpoint." The Authentication page reinforces this: "The schema suggestion endpoints (/api/suggest-schema, /api/text-to-schema) accept anonymous requests." But the "Authorizations" block at the bottom of each reference page says:
Authorization string header required API key using the format Authorization: Bearer pk_live_...
So the prose says optional, the structured reference says required. The OpenAPI excerpt corroborates the optional behavior ("security": [ {}, { "bearerAuth": [] } ]), which means the rendered "required" badge is wrong.
Consequence: A developer (or a coding agent parsing the structured "header required" block, which is the machine-friendlier signal) concludes they must obtain an API key before they can call the schema-suggestion endpoints. That defeats the entire selling point of having anonymous endpoints in the first place — onboarding friction Parze explicitly tried to remove.
The fix: Mark the Authorization header required: false on both endpoint pages so the rendered badge says "optional," matching the prose and the OpenAPI security array.
2. The "File too large" error has the wrong cause description (critical)
Location: https://docs.parze.ai/api-reference/errors — "Common detail values" table
Problem: The detail string File too large is mapped to the cause Uploaded file has no content. The next row, Empty file not allowed, also says Uploaded file has no content. One of these is obviously a copy-paste error — File too large should describe an oversized upload (the 25 MB ceiling enforced elsewhere via HTTP 413), not an empty one.
Consequence: A developer (or an agent doing error-string-based remediation) sees "detail": "File too large", looks it up in the canonical error table, and concludes their file is empty. They re-upload the same file or chase a zero-byte bug instead of splitting/compressing the document. This is exactly the kind of contradiction agents fail silently on — they don't second-guess the table.
The fix: Replace the File too large row's cause with something like "Uploaded file exceeds the 25 MB per-file limit. Split or compress the document." Keep Empty file not allowed mapped to the empty-content cause.
3. Marketing site advertises "CONFIDENCE 99.9%" while the API reference explicitly disclaims the confidence field (critical)
Location: https://parze.ai (hero "LIVE EXTRACTION PREVIEW" widget) vs. https://docs.parze.ai/api-reference/parse — confidence response field, with additional exposure on https://docs.parze.ai/quickstart
Problem: The homepage's product preview surfaces "CONFIDENCE 99.9%" as a quality signal next to extracted vendor/invoice data. The Parse reference says the opposite about the same field:
confidence — number — Overall confidence score. This is currently a placeholder value and should not be used as a calibrated quality score.
The disclaimer only lives on the Parse reference page. The Quickstart — the first thing most developers read — shows per-field extraction_metadata confidences of 0.95 and 0.88 next to a name and email with no caveat at all. So a developer reading top-to-bottom encounters the numbers as authoritative twice (marketing hero, Quickstart response) before they ever see the disclaimer.
Consequence: Developers evaluating Parze for production document workflows (finance, insurance, legal, healthcare — all named on the marketing page) will route automated approvals, exception queues, or human-in-the-loop thresholds off confidence and the per-field metadata values. The docs then tell them, in a tucked-away response-field note on the Parse page only, that the number is decorative. Either the homepage and Quickstart are misleading, or the Parse reference is underselling a real signal — and a buyer can't tell which.
The fix: Pick a position. If confidence is a placeholder, remove it from the homepage extraction preview (or label it as illustrative) and add the same disclaimer to every page that shows a confidence number — Quickstart response, Extract reference, anywhere extraction_metadata.*.confidence is rendered. If it is calibrated, document the calibration method and remove the disclaimer.
4. The Authentication page's error table is incomplete relative to the canonical Errors page (significant)
Location: https://docs.parze.ai/authentication — "Error Responses" table
Problem: The Authentication page lists four status codes: 401, 429, 400, 500. The dedicated Errors page lists six: 400, 401, 402, 413, 429, 500. The Auth page is missing 402 (Insufficient credits) and 413 (Payload Too Large) — both of which are real, distinct failure modes a developer will hit early (free tier is only 100 credits, file cap is 25 MB).
Consequence: A developer who reads Authentication first — the natural starting point after Quickstart — builds a 401/429/400/500 retry harness, then breaks the first time a user uploads a 30 MB PDF (413) or runs out of free credits (402, with the distinct Insufficient credits. body). Agents synthesizing retry logic from this page will produce the same gap.
The fix: Either add 402 and 413 rows to the Authentication page table, or replace the table with a one-line link to the canonical /api-reference/errors page so there's one source of truth.
5. The Validate request envelope is essentially undocumented (significant)
Location: https://docs.parze.ai/api-reference/validate vs. https://docs.parze.ai/api-reference/errors
Problem: The Validate reference shows the post-validation payload like this:
{
"extracted_data": {
"vendor_id": "VND-001",
"amount": 1200.5
}
}
with no surrounding key. The Errors page, however, references the path as validation_rules.extracted_data ("validation_rules.extracted_data (object) is required for post-validation"). So the actual request shape is presumably { "validation_rules": { "extracted_data": {...} }, ... }, not the bare { "extracted_data": {...} } shown in the reference.
Worse, the Validate page never documents the top-level validation_type field at all, even though the Errors page enforces validation_type must be 'pre' or 'post' and the response example on the same Validate page returns "validation_type": "pre". The field appears in responses and errors but is undefined as a request parameter.
Consequence: A developer copy-pastes the Validate example, calls the endpoint, and gets back either validation_type must be 'pre' or 'post' or validation_rules.extracted_data (object) is required for post-validation — for a payload that, per the example, looks correct. The example is unrunnable as written, and the only way to discover the right envelope is to read the Errors page and reverse-engineer it.
The fix: Show the full request envelope on the Validate page: validation_type (with enum values pre and post), the validation_rules wrapper, and extraction_schema where applicable. Provide a copy-pasteable end-to-end multipart or JSON example for both pre- and post-validation.
6. Quickstart Option B leaves the parse step elided (significant)
Location: https://docs.parze.ai/quickstart — "Option B: Parse + Extract" / "Step 1: Parse the document"
Problem: The two-step flow Parze actively recommends ("recommended to avoid double billing") shows the Step 2 extract command in full but Step 1 collapses to .... The only complete worked example in the Quickstart is the one-step flow — which the same page warns will double-bill you. Compounding this: text-based Extract requires a job_id from Parse, so a developer without a worked Step 1 example can't even fabricate a synthetic request to test Step 2.
Consequence: The cheaper, recommended path is the one without a complete copy-paste example. New users follow the example that is fully written out (Option A), pay 2× credits, and only discover the better path after reading Limits or Pricing. Coding agents that grep for the first complete cURL block will land on the more expensive call.
The fix: Include a full cURL snippet for POST /api/parse showing the file=@… upload and the resulting job_id, then thread that job_id literally into the Step 2 snippet so the two commands are runnable back-to-back.
7. extraction_mode enum exposes the deprecated llm_only value as a first-class option (significant)
Location: https://docs.parze.ai/api-reference/parse (and the same shape on https://docs.parze.ai/api-reference/extract)
Problem: The prose for extraction_mode says the modes are "ai_only, auto, ocr_only, or identity_doc" and adds "llm_only is still accepted as a legacy alias." But the rendered "Available options" enum below lists:
Available options: ai_only, auto, ocr_only, identity_doc, llm_only
llm_only is presented alongside the current modes with no deprecated flag and no visual difference. Contrast this with how the schema alias is handled in the OpenAPI spec, where it is correctly marked "deprecated": true.
Consequence: Agents and IDE auto-complete tools that index the enum will happily emit extraction_mode: "llm_only" in newly written code, and developers reading the enum without the surrounding prose will treat it as a current option. Parze has explicitly versioned the SDKs around removing llm_only ("ai_only support requires parze version 0.2.6 or newer") but the API reference still surfaces it as a peer.
The fix: Either mark llm_only deprecated in the rendered options list (and in the OpenAPI schema for the field) or remove it from the enum entirely and keep the legacy-alias acceptance as a runtime-only behavior documented in prose.
8. preserve_tables is used in the Quickstart but not defined on the Parse or Extract reference (minor)
Location: https://docs.parze.ai/quickstart (Option A cURL)
Problem: The Quickstart shows -F "preserve_tables=true" as part of the canonical first-call example, but the Parse and Extract reference pages — which list extraction_mode, visual_labels, etc. — have no entry for preserve_tables.
Consequence: Developers learn the parameter exists from the Quickstart but can't discover its accepted values, default, or interaction with extraction_mode. Agents auto-completing requests won't suggest it because it isn't in the reference.
The fix: Add preserve_tables to the Parse and/or Extract parameter list with type, default, and what happens when it's true vs false. If it's deprecated or internal, remove it from the Quickstart.
9. output_format is referenced in the Parse response but never defined as a request parameter (minor)
Location: https://docs.parze.ai/api-reference/parse
Problem: The Parse response documents blocks as "Structured blocks (present when output_format=structured)" — but output_format itself does not appear in the request parameter list on the same page, and there is no enumeration of its accepted values. The reader can infer one value (structured) from the response note, but can't discover the default, the other values, or how to request it.
Consequence: Same shape as the preserve_tables gap: a parameter is referenced as if canonical but isn't defined where developers and agents look for it. An agent generating a request will either omit output_format and silently get the default, or guess at values like markdown/json that may or may not exist.
The fix: Add output_format to the Parse request parameter list with its full enum, default, and the relationship to the text vs blocks response fields.
10. JavaScript and Python SDK error-handling examples use an undefined schema variable (minor)
Location: https://docs.parze.ai/sdks/javascript and https://docs.parze.ai/sdks/python — "Error Handling"
Problem: The JS example reads:
const result = await client.parse('./document.pdf');
const extraction = await client.extract(result.text, schema, result.job_id);
The identifier schema is never declared. The Python example has the same shape (client.extract(result["text"], schema, result["job_id"])).
Consequence: Copy-pasting either snippet produces a ReferenceError / NameError. This is the exact kind of completeness gap agents trip on — they extract the block, drop it into a file, and the runtime fails on an undefined symbol.
The fix: Either define const schema = { … } / schema = { … } above the try block (with a realistic field or two, matching the Quickstart) or annotate the placeholder so it's obviously substitution-required.
What they do well
- llms.txt is real and linked from the docs landing page, with explicit instructions ("Use this file to discover all available pages") — that's better than most APIs at this stage.
- A published OpenAPI 3.1 spec at
/openapi.jsonwith the deprecatedschemaalias correctly flagged — a pattern the rendered reference pages should adopt forllm_only. - Legacy/migration notes are surfaced inline —
llm_only→ai_only,schema→extraction_schema, and the SDK version where each shift happened — instead of being buried in a changelog.
Top 3 recommendations
- Resolve the auth-required vs auth-optional contradiction on
/api/suggest-schemaand/api/text-to-schema— the structured "required" badge is the version agents will believe. - Either remove
confidencefrom the marketing hero and the Quickstart response, or remove the "placeholder, not calibrated" disclaimer from the Parse reference. The two cannot coexist for a product sold into finance and insurance. - Make every multi-step example runnable end-to-end without filling in undefined names or guessing envelopes — flesh out Quickstart Step 1, document the full Validate envelope (
validation_type+validation_ruleswrapper), and defineschema,preserve_tables, andoutput_formatwhere they're first used.