Extend Documentation Audit
Extend's docs are unusually agent-first — .md mirrors, per-section llms.txt, an MCP server, a hand-authored agents.md — and the reference material is broad. But the agent-facing files contradict each other and the OpenAPI spec on the exact details an automated client copies verbatim (ID prefixes, SDK versions, region codes, helper availability), so the more an AI agent trusts these docs, the more likely it is to ship a broken call.
1. agents.md documents the wrong splitter ID prefix (sp_) vs the canonical spl_ (critical)
Location: https://docs.extend.ai/agents.md vs https://docs.extend.ai/api-reference/endpoints/split/get-splitter.md (OpenAPI) and https://docs.extend.ai/cli.md
Problem: agents.md lists the new ID prefixes as "splitters use sp_." But the OpenAPI 3.1 spec for get-splitter gives the canonical example "spl_Xj8mK2pL9nR4vT7qY5wZ" (and versions as splv_), and the CLI uses extend split combined.pdf --using spl_abc. The prefix is spl_, not sp_.
Consequence: agents.md is explicitly the file Extend tells you to save as AGENTS.md/CLAUDE.md. An agent that learns sp_ from it will construct or validate splitter IDs against the wrong prefix, producing invalid IDs or failed lookups with no obvious cause.
The fix: Change sp_ to spl_ in agents.md, and add a quick automated check that the prefix table in agents.md is generated from (or diffed against) the OpenAPI spec so it can't drift again.
2. changelog.md returns nothing — breaking the docs' own "append .md" contract on a page agents poll (critical)
Location: https://docs.extend.ai/changelog.md vs https://docs.extend.ai/changelog
Problem: The agent instructions promise: "Append .md to any docs URL for clean Markdown of that page." For the changelog this fails — changelog.md returns only the title and the one-line tagline ("Stay up to date on what's shipping in the Extend platform.") with zero entries, while the rendered HTML at /changelog is full of dated entries (e.g. "June 10, 2026 — Processor cost preview", "June 9, 2026 — Cancel in-flight parse runs", "May 26, 2026 — OCR word confidence").
Consequence: The changelog is exactly what an AI client fetches to learn "what changed recently." An agent obeying the documented .md contract sees an empty changelog and concludes nothing has shipped — missing real, breaking-relevant updates like run cancellation and new validation rules. Silent failure, worst kind for an automated consumer.
The fix: Make changelog.md render the actual entries (the contract page generation is evidently skipping the changelog's dynamic content). If certain pages can't be .md-rendered, document the exception explicitly rather than serving an empty stub.
3. The detailed agent index claims the Go SDK has polling and webhook helpers; two other pages say it has neither (critical)
Location: https://docs.extend.ai/llms-full.txt (the detailed index served at that URL) vs https://docs.extend.ai/agents.md, https://docs.extend.ai/general/async-processing.md
Problem: The detailed agent index (served when you request llms-full.txt) advertises, in its SDKs entry: "Official Python, TypeScript, Java, and Go SDKs with polling and webhook helpers." But agents.md states: "The Python, TypeScript, and Java SDKs include polling helpers... The Go SDK ships neither -- call Create and poll the run yourself (or use webhooks), and verify webhook signatures manually (HMAC-SHA256)." And async-processing.md confirms: "The Go SDK does not include a polling helper." (Note: sdks.md itself only says SDKs exist "in Python, TypeScript, Java, and Go to help you integrate faster" — it makes no Go-helper claim, so the bad claim originates in the index, not the SDK page.)
Consequence: A Go developer (or an agent generating Go code) who believes the index will look for CreateAndPoll/webhook-verification helpers that don't exist, then waste time before discovering they must hand-roll polling and HMAC-SHA256 verification. The capability claim is wrong in the very file that exists to brief automated clients.
The fix: Correct the SDKs entry in the agent index to scope the helper claim to Python/TypeScript/Java, and add a Go-specific note pointing to the manual polling + signature-verification pattern already documented in agents.md.
4. The root llms.txt tells bots to fetch a file the detailed index says does not exist (significant)
Location: https://docs.extend.ai/llms.txt and https://docs.extend.ai/llms-full.txt
Problem: The short root llms.txt instructs agents: "For full documentation content in one file, see https://docs.extend.ai/llms-full.txt." But requesting llms-full.txt redirects to /llms.txt and serves a different, detailed index that states the opposite: "There is no combined llms-full.txt; use this index plus per-page .md instead." Two files at the documented entry points give contradictory instructions about whether a combined file exists.
Consequence: An agent following the root llms.txt will request llms-full.txt, silently land on a redirect to a different document than the one it asked for, and either waste a fetch or index the wrong file. The agent still reaches a valid index and won't ship a broken API call from this alone — but the single most important contract for "teach an AI agent how to consume these docs" is internally inconsistent.
The fix: Pick one contract. Either publish a real llms-full.txt and keep the root pointer, or remove the llms-full.txt line from the root llms.txt and serve the detailed index there directly. Don't redirect a named file to a document that denies its own existence.
5. Java SDK version and Maven artifact are inconsistent across four pages (significant)
Location: https://docs.extend.ai/api-quickstart.md, https://docs.extend.ai/sdks.md, https://docs.extend.ai/api-reference/migrations/2026-02-09/overview.md
Problem: The Java install instructions disagree everywhere: api-quickstart.md pins version 1.12.0; sdks.md shows no version in the Gradle snippet and 0.0.1-beta in the Maven snippet; the migration overview references Maven artifact extend-java with version LATEST, while other pages use extend-java-sdk. So the same SDK appears as 1.12.0, 0.0.1-beta, unversioned, and LATEST, under two different artifact IDs.
Consequence: A developer can't tell which version is real. Copying 0.0.1-beta pulls a pre-release; copying extend-java/LATEST may resolve to a nonexistent or wrong artifact; the 1.12.0/0.0.1-beta gap (1.x vs 0.0.1) suggests at least one page is badly stale. Build failures or silent version drift result.
The fix: Standardize on one artifact ID (extend-java-sdk) and one current version string across every Java snippet, and avoid LATEST in published examples. Generate the version from a single source so quickstart, SDK page, and migration guide stay in lockstep.
6. The same SDK client classes and import paths change from page to page (significant)
Location: https://docs.extend.ai/api-quickstart.md, https://docs.extend.ai/api-reference/authentication.md, https://docs.extend.ai/api-reference/error-handling.md, https://docs.extend.ai/parsing/error-handling.md, https://docs.extend.ai/api-reference/migrations/2026-02-09/overview.md
Problem: For Java, api-quickstart.md uses ExtendClientWrapper (from ai.extend.wrapper), authentication.md uses ai.extend.ExtendClient, error-handling.md uses ExtractRequestExtractor / ExtractRequestFile.of(...) / ExtendClientApiException, and the migration overview uses ExtractorInput / FileInput builders — different class names for the same operations. For Python, api-reference/error-handling.md imports ApiError from extend_ai.core, while parsing/error-handling.md imports it from extend_ai.core.api_error — only one path can be correct.
Consequence: Copy-paste fails. A Python developer who copies the wrong ApiError import gets an ImportError; a Java developer can't reconcile ExtendClientWrapper vs ExtendClient vs ExtractorInput-style builders and won't know which is current. Agents, which can't apply judgment across pages, will emit non-compiling code.
The fix: Pin canonical class names and import paths per language and use them in every snippet (these should come from generated SDK output). At minimum, fix the ApiError import so both error-handling pages agree, and reconcile the Java client class naming.
7. Three pages give three different counts of supported API versions (significant)
Location: root https://docs.extend.ai/llms.txt, https://docs.extend.ai/agents.md, https://docs.extend.ai/api-reference/api-versioning.md
Problem: The root llms.txt lists three versions (2026-02-09, 2025-04-21, 2024-12-23). agents.md lists four (adds 2024-07-30). api-versioning.md lists six (2026-02-09, 2025-04-21, 2024-12-23, 2024-11-14, 2024-07-30, 2024-02-01).
Consequence: A developer on an older key (e.g. one created before April 21, 2025, which api-versioning.md says defaults to legacy 2024-12-23) may be pinned to a version the root index doesn't even acknowledge. Anyone trying to enumerate valid x-extend-api-version values from llms.txt or agents.md will get an incomplete set and can't reliably plan a migration.
The fix: Treat api-versioning.md's changelog table as the single source of truth and generate the version lists in llms.txt and agents.md from it, so all three always show the same set.
8. Region/deployment identifiers are inconsistent across four pages, including a .app vs .ai host split (significant)
Location: https://docs.extend.ai/api-reference/deployments.md, https://docs.extend.ai/agents.md, https://docs.extend.ai/cli.md, compliance.md (per the deployments evidence note), and the OpenAPI servers block
Problem: The same three deployments are named four different ways. deployments.md labels them "Production (Default)", "US2", "EU1"; agents.md labels the default "US1 (default)" alongside "US2"/"EU1"; the CLI uses lowercase codes us | us2 | eu (note eu, not eu1); and compliance.md reportedly uses us1/us2/eu1. On top of the labeling sprawl, the host TLDs differ — US2 lives on api.us2.extend.app while Production and EU1 are on .ai (api.extend.ai, api.eu1.extend.ai). The .app/.ai split is confirmed in the OpenAPI servers block, so it's real, not a typo.
Consequence: A developer configuring the CLI for EU sets EXTEND_REGION=eu, but reading deployments.md/agents.md/compliance.md would reasonably guess eu1 or EU1 — a value the CLI doesn't list. The default region is alternately "Production" and "US1" depending on the page. And anyone hardcoding the US2 base URL from muscle memory (.ai) will hit the wrong domain because US2 is uniquely on .app. All are easy-to-miss, hard-to-debug misconfigurations.
The fix: Document one canonical mapping table (region label → CLI code → base URL) and reuse it everywhere — including settling whether the default is "Production" or "US1." Call out explicitly that US2 uses the .app TLD, since it's the lone exception.
9. The rate-limits page is too vague to implement against and omits any 429 handling (significant)
Location: https://docs.extend.ai/general/rate-limits.md
Problem: The page titled "Rate Limits" (a) frames limits per plan tier (PAYG/Scale/Enterprise), (b) lists non-numeric limits — Scale is "25+" QPS / "120+" runs, Enterprise "75+" / "300+", and (c) contains no Retry-After header documentation, no 429 response example, and no retry/backoff code anywhere on the page.
Consequence: A developer cannot build correct client-side throttling. "25+" gives no value to rate-limit against, and with no documented Retry-After or 429 body, there's no way to implement compliant backoff. Production clients will either over-throttle or get rejected unpredictably.
The fix: Provide concrete numbers (or a documented way to read your current limit), show a real 429 response including any Retry-After header, and add a backoff/retry snippet in each SDK language.
10. The credits example invokes a "race pipeline strategy" surcharge that's defined nowhere (significant)
Location: https://docs.extend.ai/general/how-credits-work.md
Problem: The worked example computes parsing cost as "10 * 2 * 2 for race + 5 * 1 for text correction + 3 * 1 for table correction" — applying a 2× "race pipeline strategy" multiplier. But the surcharge tables on the same page only define Priority Parsing (2×); "race" appears nowhere else, with no definition of what it is, when it applies, or how to enable/avoid it.
Consequence: Credit cost is unpredictable. A developer estimating spend can't reproduce the example's math or know whether their own parse runs will silently incur the 2× "race" surcharge, since the term is never explained. Cost-modeling and budgeting break.
The fix: Add a "race pipeline strategy" row to the surcharge table with a definition and trigger conditions, or correct the example to use a multiplier that's actually documented.
11. logprobsConfidence is deprecated but still shown in examples and offered as a routing field that resolves to null (significant)
Location: https://docs.extend.ai/extraction/confidence-scores.md
Problem: The page states logprobsConfidence is being phased out: "The extraction_light processor has never returned it, and extraction_performance version 4.6.0 and later return null." Yet examples on the same page still display logprobsConfidence values, and the workflow-routing section still offers {{extractionStepName.output.metadata.field_name.logprobsConfidence}} as a field accessor — which would now evaluate to null on current models.
Consequence: A developer who builds a conditional/routing rule on logprobsConfidence (as the page's own examples encourage) will get null on extraction_performance >= 4.6.0 and on all extraction_light runs, so the rule silently mis-routes documents. The deprecation warning and the recommended usage on the same page point in opposite directions.
The fix: Replace logprobsConfidence in the live examples and routing accessors with ocrConfidence / Review Agent scoring, and clearly mark any remaining logprobsConfidence references as legacy-only with the version it goes null at.
12. The dashboard URL differs across pages (app.extend.ai vs dashboard.extend.ai) (significant)
Location: https://docs.extend.ai/agents.md vs https://docs.extend.ai/api-quickstart.md
Problem: agents.md says "Get your API key from the Extend dashboard (https://app.extend.ai) under Developer Settings," while api-quickstart.md and other pages send users to dashboard.extend.ai/developers. Two different hostnames for the place you obtain credentials.
Consequence: The very first onboarding step — find your API key — sends developers to two different domains. If only one is canonical, the other is a dead or misleading link at the most friction-sensitive moment; an agent has no way to know which to surface.
The fix: Use one canonical dashboard hostname everywhere (and redirect the other if both must resolve). Standardize the deep link to the Developer Settings / Developers page.
13. The OpenAPI spec defines a 402 response that the error-handling docs never mention (significant)
Location: https://docs.extend.ai/api-reference/endpoints/split/get-splitter.md (OpenAPI) vs https://docs.extend.ai/api-reference/error-handling.md and https://docs.extend.ai/parsing/error-handling.md
Problem: The OpenAPI spec lists a 402 response for the endpoint, but the error-handling pages document no 402 Payment Required case — they cover the custom parsing error codes (e.g. FILE_SIZE_TOO_LARGE, OCR_ERROR) and the general error structure, with no entry explaining when a 402 occurs or how to handle it.
Consequence: A client that hits a 402 (e.g. credits exhausted / billing issue) has no documented meaning or recovery path, so error-handling code won't account for it. This is the kind of status that maps directly to "your processing just stopped in production," and it's undocumented.
The fix: Add 402 to the HTTP status section of the error-handling docs with its trigger (e.g. insufficient credits / billing) and recommended handling, so it matches the OpenAPI contract.
14. Migration-guide links use three different path forms across pages (minor)
Location: https://docs.extend.ai/agents.md, https://docs.extend.ai/api-reference/api-versioning.md, https://docs.extend.ai/api-reference/migrations/2026-02-09/overview.md
Problem: The migration guide is linked three different ways. agents.md links it as /api-reference/migrations/2026-02-09/overview (no leading version prefix). api-versioning.md links the 2026 guide as /2026-02-09/api-reference/migrations/2026-02-09/overview (with a leading /2026-02-09/ prefix) and the 2025 guide under an entirely different shape, /developers/migrating-to-2025-04-21-api-version. The overview.md page itself is reachable without any version prefix.
Consequence: An agent building a link graph or a developer following cross-references can't tell which migration-link form is canonical, and the /developers/... form looks like a different page entirely. If any of these redirect chains is ever removed, the inconsistent links break unpredictably.
The fix: Pick one canonical migration-link form (with or without the version prefix) and use it in every page, and reconcile the /developers/migrating-to-2025-04-21-api-version form into the same scheme.
15. The password-protected files page is undiscoverable from the index and linked under two different paths (minor)
Location: https://docs.extend.ai/parsing/error-handling.md, https://docs.extend.ai/general/supported-file-types.md
Problem: parsing/error-handling.md links the feature as /password-protected-files, while supported-file-types.md links it as /product/password-protected-files. Both currently resolve (via redirect), but the page is absent from the llms.txt index, so the canonical path is ambiguous and the page can't be discovered from the documented index.
Consequence: An agent building a link graph from llms.txt never finds the password page, and the two human-facing link forms make it unclear which path is canonical — fragile if one redirect is ever removed. This matters because PASSWORD_PROTECTED_FILE is a non-retryable error whose only fix is reading this page.
The fix: Pick one canonical path, use it in both links, and add the page to the llms.txt/section index so it's discoverable.
16. The page-range end parameter silently defaults to 750, surfaced only in the OpenAPI spec (minor)
Location: https://docs.extend.ai/api-reference/endpoints/split/get-splitter.md (OpenAPI PageRangesItems)
Problem: In the OpenAPI spec, PageRangesItems.end carries default: 750. A caller who specifies a page range with only a start will silently get an end of 750. This default appears only in the machine-readable spec; it is not surfaced in the prose parsing/splitting docs provided.
Consequence: A developer who sets a start and omits end, expecting "to the end of the document," may instead get a range capped or extended to page 750 with no warning — over- or under-processing documents and skewing both results and credit cost, with nothing in the prose docs to explain the behavior.
The fix: Document the 750-page default for end in the parsing/splitting prose, including what happens for documents shorter or longer than 750 pages, rather than leaving it discoverable only by reading the raw spec.
17. The "Light" pricing table omits Edit rows that the "Performance" table includes (minor)
Location: https://docs.extend.ai/general/how-credits-work.md
Problem: The Performance base-pricing table includes Edit Filling (1 credit) and Edit Schema Generation (2 credits) rows, but the Light pricing table omits them entirely.
Consequence: A developer using Light mode can't tell whether Edit operations are unsupported in Light or just unpriced — leaving a gap in cost estimates for any workflow that uses form filling or schema generation on Light.
The fix: Either add the Edit rows to the Light table (with their credit costs) or add a note stating Edit operations are Performance-only.
18. The Community SDK is described without a language or link on the page developers actually read (minor)
Location: https://docs.extend.ai/sdks.md vs https://docs.extend.ai/agents.md
Problem: sdks.md describes the community SDK only as a "Custom client maintained by the Mercury engineering team" — no language named and no link. agents.md reveals it's Haskell, maintained by Mercury Technologies, at https://github.com/MercuryTechnologies/extend.
Consequence: A Haskell developer browsing the SDK page can't tell the community client is for them or where to find it, because the one detail that matters (the language) and the repo URL only exist in the agent file, not the human-facing SDK page.
The fix: Add the language (Haskell) and the GitHub URL to the Community SDK entry in sdks.md, matching agents.md.
What they do well
- Genuinely agent-first infrastructure: per-page
.md, per-sectionllms.txt, a hosted MCP server, and a hand-authoredagents.md/CLAUDE.mdfile — more thought put into machine consumption than most platforms. - Machine-readable API surface: a published OpenAPI 3.1 spec backs the reference, so endpoints, ID prefixes, and the
.app/.aiserver hosts are programmatically discoverable (and serve as the tie-breaker when prose pages disagree). - Honest about edge behavior: async failures surfacing as
status: "FAILED"rather than raising, the deprecatedtables.enabledflag, and thelogprobsConfidencephase-out are all documented rather than hidden.
Top 3 recommendations
- Make the agent-facing files a single generated source of truth. The
llms.txt↔llms-full.txtcontradiction, thesp_/spl_prefix error, the 3-vs-4-vs-6 version counts, and the emptychangelog.mdare all drift between hand-maintained files and the real spec — generatellms.txt/agents.md/version lists from the OpenAPI spec so they can't disagree. - Fix every copy-paste code path. Reconcile the Java version/artifact/class names and the Python
ApiErrorimport so a snippet runs unmodified in any language an agent picks. - Make the operational pages implementable. Give rate-limits real numbers plus a documented
Retry-After/429 example, define the "race" credit surcharge, document the402response, and surface the 750-pageenddefault — these are the pages developers hit in production, and right now they can't be coded against.