Don't want to fix it yourself?

Check out Manicule.

Visit Manicule

Report/May 15

Reducto

docs.reducto.ai

Manicule Score

0100

Pages read30

Critical4

Significant7

Minor4

Surfacedocs.reducto.ai

Verdict

“five pages, three retention windows, two ZDR scopes — the compliance story changes depending on which tab you opened first”

Share on X

Reducto Documentation Audit

Overall: the docs surface is broad (80+ pages, OpenAPI, MCP server, llms.txt) and the agent-guide is a genuine asset, but multiple load-bearing facts contradict themselves across pages — retention windows, SDK kwarg names, ZDR scope — in ways that will silently break integrations and compliance reviews.

1. Data retention policy contradicts itself across at least six pages (critical)

Location: /overview, /agent-guide, /upload/overview, /mcp-server vs. /reference/faq, /reference/glossary, /workflows/async-overview, /workflows/direct-webhooks vs. /security/policies vs. /security/eu-data-residency

Problem: The same retention number is given four different ways:

/overview security card: "Documents deleted within 24h."
/agent-guide and /parse/overview: results "expire after 24h" (persist_results description).
/upload/overview: "Files expire 24 hours after upload."
/mcp-server: the reducto:// scheme is described as "A file in Reducto's temporary storage (24-hour TTL)."
/reference/faq, /reference/glossary, /workflows/async-overview, /workflows/direct-webhooks: "job results are deleted after 12 hours" / "Jobs are deleted after 12 hours."
/security/policies: "data submitted via API is set to expire within 24 hours" — and scopes ZDR to "Growth tier and above."
/security/eu-data-residency: "strict 24-hour maximum retention window … Jobs are purged automatically every 12 hours" (internally inconsistent on its own page).

Consequence: A developer wiring up async polling, webhook retry logic, or a compliance review cannot tell whether jobs disappear at 12h or 24h, whether ZDR is the default or a paid tier feature, or what to put in their own privacy notice. The "Job ID not found" failure mode the FAQ warns about will hit users who trusted the 24h numbers in the overview/agent-guide/MCP docs.

The fix: Pick one canonical retention window (the eu-data-residency page suggests the real answer is "purged every 12h with a 24h ceiling") and replace every other occurrence with that exact phrasing. Separately, decide whether ZDR is the default for everyone or a Growth+ feature, and rewrite /security/policies and /reference/glossary to agree.

2. Three different Python SDK kwarg names for the same async/priority option (critical)

Location: /workflows/async-overview, /parse/best-practices, /workflows/svix-webhooks, /workflows/direct-webhooks

Problem: The Python examples disagree about the kwarg name for async/priority/webhook config on client.parse.run_job(...):

/workflows/async-overview: async_={"priority": True}
/workflows/svix-webhooks: async_={"webhook": {...}, "metadata": {...}}
/parse/best-practices: async_config={"priority": True} (and asyncConfig in JS)
The cURL example on best-practices uses a top-level "options": {"priority": true} body, which matches neither Python form.

Consequence: Only one of these can be the real SDK signature. Developers who copy the best-practices snippet (async_config=) will get a TypeError: unexpected keyword argument or have their priority flag silently ignored — exactly the kind of "agent fails silently" pattern that bites both humans and coding agents copying snippets verbatim.

The fix: Pick the real SDK kwarg (presumably async_ based on three of four mentions and the cross-reference to webhook config) and grep all run_job( and runJob( examples across the docs to align them. Add a single canonical "async options" reference page and link the rest there instead of re-stating the shape.

3. Parse-overview chunk modes don't match the linked "full chunking options" reference (significant)

Location: /parse/overview vs. /configs/parse/chunking-methods

Problem: /parse/overview lists chunk modes as disabled, variable, page, section (4 values) in an inline table and then says "Full chunking options →". The presence of a "full options" link implies the four-mode table is a subset, but the overview gives no indication of what's missing or how to pick — and the OpenAPI enum that backs the parameter is never quoted on the page.

Consequence: A developer or coding agent picking a chunk_mode value from the overview page will not know whether they've seen all valid values. If they pass a mode that exists only on the "full options" page, they'll get a 422; if they pass one missing from both, the failure mode depends on the server. Inline parameter tables that are knowingly incomplete are exactly the pattern that produces "I followed the docs and still got a validation error" tickets.

The fix: Either inline the complete enum on /parse/overview (generated from the OpenAPI spec), or replace the four-row table with a one-line "see [Chunking Methods] for the full enum" and stop showing a partial list. Whichever direction you pick, regenerate the enum from openapi.json so it can't drift.

4. ZDR scope contradicts itself: default policy vs. Growth-tier feature (critical)

Location: /reference/glossary, /overview vs. /security/policies, /enterprise/enterprise-readiness, /onprem/enterprise_deployment_options

Problem: The glossary defines ZDR as "Reducto's default data policy. Uploaded documents and job results are deleted within 12 hours." The marketing overview shows a "Zero Data Retention" security card with no tier caveat. But /security/policies says ZDR applies only to "users on our 'Growth' tier and above," and the enterprise-readiness tier matrix shows ZDR as — for the Standard tier and only "Yes (Ephemeral)" for Growth/Enterprise.

Consequence: A prospect reading the overview will believe ZDR is universal; a customer reading the security/compliance page will discover it's gated. For HIPAA/BAA conversations, this is the single most consequential claim on the site, and it disagrees with itself.

The fix: Decide whether deletion-within-window is the platform default (in which case fix the policies/enterprise pages) or a tier-gated commitment (in which case fix the overview, glossary, and marketing surfaces). If only the contractual ZDR commitment is tier-gated but the technical behavior applies to everyone, say that explicitly.

5. HIPAA complaints page lists three contact addresses, including a mismatched mailto link (critical)

Location: /security/filing-complaints

Problem: The page directs complaints to support@reducto.ai, then says questions go to "our privacy officer at security@reducto.ai" — the label text says security@, the underlying mailto link is privacy@. So a single sentence references two different addresses, neither matching the support address above it.

Consequence: This is a regulated HIPAA complaint procedure — exactly the kind of artifact a BAA references and a regulator may scrutinize. A user clicking the link sends mail to privacy@; a user copy-pasting the visible text emails security@; the page's own intake is support@. If any of those mailboxes isn't monitored, a HIPAA complaint can be lost, and the discrepancy itself is the sort of thing that surfaces in audit findings.

The fix: Confirm the real recipient and replace both the label and the URL with that one address. If multiple addresses are intentional (intake vs. escalation), name each one explicitly and what it does, and make every mailto label match its URL.

6. Subprocessor lists disagree between general policies and EU residency (significant)

Location: /security/policies vs. /security/eu-data-residency

Problem: /security/policies lists 11 subprocessors including Anthropic, Cloudflare, Together, Datadog, and OpenRouter. /security/eu-data-residency lists only AWS, GCP, Modal, and OpenAI as document-processing subprocessors, with Sentry/PostHog as telemetry-only. There is no narrative explaining why Anthropic, Cloudflare, Together, Datadog, and OpenRouter disappear in the EU residency path, or whether their absence is a hard guarantee or an oversight.

Consequence: EU customers performing a DPA / Art. 28 review cannot tell whether the second list is "the EU subset" (in which case the omitted vendors are deliberately excluded for EU traffic and that's a sellable guarantee), or whether one list is simply stale. Either way, the gap is exactly the kind of detail procurement teams flag.

The fix: On the EU page, explicitly state the rule ("the following SaaS-list vendors are not in the EU processing path: …") and link back to the master list. Add a "last updated" date to both tables.

7. SaaS deployment page understates the SaaS stack relative to the subprocessor list (significant)

Location: /onprem/enterprise_deployment_options vs. /security/policies

Problem: The SaaS option in the deployment-options page says the SaaS is "Built on Amazon Web Services (AWS) and Modal Labs as the primary cloud providers" and stores data on AWS S3. But /security/policies enumerates eleven subprocessors involved in the SaaS pipeline, including Google Cloud, Cloudflare, Together, Datadog, OpenRouter, Anthropic, and OpenAI.

Consequence: A prospect evaluating SaaS vs. on-prem reads the deployment page and forms a mental model of a two-vendor stack (AWS + Modal). When they later open the subprocessor list during procurement, they encounter five additional categories of vendor that nobody mentioned. This is the kind of mismatch that derails security reviews and erodes trust even when the underlying setup is fine.

The fix: Either expand the SaaS option to enumerate the actual processing-path vendors (or link directly to the subprocessor list with a one-line "for the full list, see policies"), or move the "AWS + Modal Labs" claim into a "primary compute and storage" caveat that doesn't imply exhaustiveness.

8. On-prem fair-queueing docs assume a SaaS base URL without saying so (significant)

Location: /onprem/enterprise_deployment_options fair-queueing section

Problem: The fair-queueing instructions for an "on-premise deployment of Reducto" tell users to "make your existing requests to /parse and /parse_async as normal" — relative paths only, with no note that on-prem hosts substitute their own base URL. The hosted-SaaS cURL examples elsewhere in the docs (e.g. /parse/best-practices) default to https://platform.reducto.ai/..., so a customer who lands on this page after reading the SaaS docs will reasonably carry that base URL into their on-prem config.

Consequence: On-prem customers — the exact audience for this page — have to infer that the endpoints should be called on their own VPC host rather than platform.reducto.ai. For an air-gapped deployment this can mean traffic that fails closed (no connectivity at all) or, worse, accidental egress attempts to the SaaS host from a network that shouldn't reach it.

The fix: On every on-prem page that shows API calls, use a placeholder like https://<your-reducto-host>/parse and call out that on-prem requests never go to platform.reducto.ai. Add a one-paragraph "base URL for on-prem deployments" note to the deployment-options page.

9. `llms.txt` index omits `/reference/model-versions` even though other docs link to it (significant)

Location: /llms.txt vs. /reference/version-pinning, /reference/model-versions

Problem: /reference/version-pinning says "For the current status of each model version, see Model Versions." The model-versions page exists and returns 200, with substantive content about Layout v1/v2 deprecation, Extract v2 alpha, and Agentic Tables v2 default. But the canonical agent index /llms.txt does not list /reference/model-versions among its enumerated pages. (The visible llms.txt excerpt is truncated, so other omissions may exist, but this one is verifiable.)

Consequence: This is precisely the failure mode llms.txt is supposed to prevent: an agent that loads only the index will believe model versioning has no detail page, even though one exists with information critical to pinning the right version before a deprecation date (Layout v1 removal "after May 6, 2026" is on that very page).

The fix: Regenerate llms.txt from the same source of truth as the sitemap (or at minimum ensure it lists every page reachable from the docs nav). Add a CI check that fails if any page reachable from the site nav is missing from llms.txt, or vice versa.

10. API reference pages in `llms-full.txt` are reduced to OpenAPI stubs (significant)

Location: /api-reference/parse, /api-reference/extract, /api-reference/edit, /api-reference/pipeline, /api-reference/async-parse, /api-reference/cancel-job (as served via llms-full.txt)

Problem: Every /api-reference/* page in the agent-readable bundle collapses to a one-line directive like:

# Parse
Source: https://docs.reducto.ai/api-reference/parse
/openapi.json post /parse

There is no parameter list, no example body, no description in the LLM-readable form. An agent fetching llms-full.txt to learn how to call /parse gets the URL and HTTP verb only.

Consequence: Agents that don't separately fetch and parse openapi.json will have no idea what the request body for /parse, /extract, /pipeline, etc. looks like — even though the human-rendered site has full schemas. The agent-guide is supposed to make up for this, but it doesn't cover every endpoint and contradicts other pages (see #1, #2).

The fix: Either inline the OpenAPI schema for each endpoint into its markdown page (so llms-full.txt carries the actual parameters), or have the agent-guide explicitly tell agents "for endpoint-level parameter detail, fetch /openapi.json and reference the operation at this path." Right now the bundle implies parameter detail is on the page when it isn't.

11. Classify is declared sync-only but the rate-limits page doesn't list it (significant)

Location: /classify/overview vs. rate-limits page

Problem: /classify/overview includes a Note: "Classify is synchronous only. The endpoint is optimized for low latency, so classification results return fast enough that async polling or webhooks are unnecessary." Yet /classify doesn't appear in the rate-limits page's listing of synchronous endpoints with their concurrency limits, leaving the per-account concurrency budget for Classify undocumented.

Consequence: A team building a high-throughput classifier — exactly the use case Classify is positioned for — cannot tell how many concurrent /classify requests they can make before getting 429s, or whether Classify shares a pool with /parse and /extract. They have to discover the answer by load-testing in production.

The fix: Add /classify to the sync-endpoint table on the rate-limits page with its concurrency cap, and state explicitly whether it shares a budget with other sync endpoints. While you're there, cross-link the Classify overview to the rate-limits page.

12. Extract response shape unspecified for `array_extract` + citations combination (minor)

Location: /extract/overview

Problem: The settings table documents array_extract ("segments the document, extracts from each segment, and merges results") and citations.enabled ("Return page number, bounding box, and source text for each extracted value") as independent flags, and a separate warning notes that citations disable chunking. But the response-format section never shows what result looks like when both are enabled — whether each array element gets its own citations object, whether citations are keyed by JSON pointer into the merged array, or whether the two interact at all.

Consequence: A developer building an invoice-line-item extractor with verification (the textbook use case for combining the two) cannot write a typed handler without trial-and-error against the live API. Coding agents writing extraction pipelines will guess at the shape and get parsing errors.

The fix: Add a worked example showing the actual JSON response for an array_extract: true, citations.enabled: true request, and document whether citations attach per-array-element or at the top level.

13. On-prem changelog gated behind a password but versions leak via llms-full.txt (minor)

Location: /onprem/changelog

Problem: The live page is wrapped in <PasswordProtect> so browsers see a gate. The llms-full.txt representation contains the full changelog text including version numbers (v1.11.76, v1.11.75), dated entries, and internal references to reducto-worker, reducto-priority-worker, reducto-gpu-worker pod names and SIGUSR2 handlers.

Consequence: If the password gate is intended as a confidentiality control for on-prem customers, exposing the same content via llms-full.txt defeats it. If the gate is not intended as a confidentiality control, it's a frustrating UX for legitimate customers and the gate should be removed.

The fix: Decide whether on-prem release notes are confidential. If yes, exclude /onprem/* pages from llms.txt/llms-full.txt. If no, remove the password protection on the live page.

14. On-prem file retention default (60 min) never reconciled with SaaS retention (minor)

Location: /onprem/file_cleanup vs. SaaS retention pages

Problem: On-prem default FILE_RETENTION_MINUTES=60 (1 hour minimum, enforced by the hourly cleanup job). SaaS docs talk in 12h/24h terms. Nothing on either page reconciles the two: a customer evaluating both deployment options has to puzzle out why the on-prem default is 24× shorter than SaaS.

Consequence: Sales/security conversations get harder than they need to be — the prospect asks "so does SaaS retain 24× longer than on-prem?" and there's no doc answer. Buyers comparing the two on data-minimization grounds reach the wrong conclusion in either direction.

The fix: Add a one-paragraph "Retention across deployment modes" comparison to /security/policies or the enterprise-readiness page covering SaaS default, on-prem default, and how persist_results interacts with each.

15. Two "batch processing" pages with no stated relationship (minor)

Location: /workflows/batch-processing and /cookbooks/batch-processing

Problem: Two separate pages cover "process multiple documents in parallel" with overlapping topics. Neither links to the other or explains which to read first. The llms.txt titles ("Parallel Document Processing" vs. "Batch Processing") suggest a deliberate split but it isn't articulated anywhere.

Consequence: Search results for "batch" surface both; developers can't tell whether they're alternatives, a tutorial vs. reference split, or one is obsolete.

The fix: Make one canonical and have the other open with a one-line "see also" pointer, or merge them. If they really are tutorial vs. reference, say so at the top of each.

What they do well

The /agent-guide page is a genuinely useful agent-first reference with parameter tables, error codes, and worked examples — most docs sites don't ship anything like it.
A real llms.txt exists alongside the OpenAPI spec, an MCP server, and a CLI; the surface area for programmatic consumption is unusually complete.
The page-billing breakdown page documents the legacy spreadsheet_cells field that may still appear in old responses — that kind of transition-state honesty is rare and helpful.

Top 3 recommendations

Pick one retention number and one ZDR scope, then mass-update. Findings #1 and #4 together affect at least eight pages including the security, overview, and MCP surfaces. Until this is resolved, every compliance review will surface contradictions.
Audit every Python and Node example for async_ / async_config / asyncConfig / options and pick one shape. Add a snippet-validation step in CI that imports the SDK and type-checks each example.
Regenerate llms.txt from the sitemap and inline OpenAPI parameter detail into the per-endpoint markdown so the agent-readable bundle isn't strictly worse than the rendered site for the most important endpoints.

Code Verification

Runtime snippet checks

Completed

Total

PASS

FIXED

SKIP

FAIL

Failing pages

https://docs.reducto.ai/parse/best-practices
https://docs.reducto.ai/workflows/async-overview

Summary

Executed 56 runnable snippets across 13 Reducto documentation pages using the user-supplied REDUCTO_API_KEY, the official reductoai Python (0.22.0) and Node.js SDKs, curl, and jq. The Reducto API is reachable and the supplied key authenticates against platform.reducto.ai. Two confirmed documentation bugs:

parse/best-practices Python snippet uses the wrong keyword argument async_config={"priority": True} — client.parse.run_job() raises TypeError: run_job() got an unexpected keyword argument 'async_config'. The correct kwarg (per the SDK and used elsewhere in the docs) is async_={"priority": True}.
async-overview TypeScript polling snippet calls a non-existent method client.job.retrieve(jobId) — the JS SDK exposes client.job.get(jobId) (verified via the SDK's exported prototype: ['cancel','get','getAll']). Throws TypeError: client.job.retrieve is not a function.

Other partially-validated finds while running:

The Node.js asyncConfig: variant from parse/best-practices runs without throwing because the JS SDK forwards unknown camelCase keys to the request body; the server returns a job_id, but the field name does not match the SDK's async: schema (AsyncConfigV3 on AsyncParseConfig.async), so the priority semantics the docs claim are silently lost.
The cURL variant on parse/best-practices uses "options": {"priority": true} instead of "async": {"priority": true}; the server returns a job_id but ignores the unknown field. Treated as PASS because the call succeeds, but the priority flag does not take effect.

Skipped: Go snippets (no Go toolchain), Flask/Express webhook handler snippets (server processes that require an external HTTP receiver to verify), extract version-pinning snippets containing {...} schema placeholders (partial fragments), npx/uvx MCP install commands (interactive / require external installer), private GitHub Packages install for @reductoai-collab/components (credentials/registry unavailable), and React/TSX component fragments (require a host React app — partial fragments).

Note: The user-provided 20-minute budget interrupted the run; the page-by-page table below reflects only snippets actually executed. No additional snippets were inferred.

Required credentials

REDUCTO_API_KEY — used for every Reducto API call (Python SDK auto-reads from env; also passed via Authorization: Bearer for cURL and Node).

Pages

https://docs.reducto.ai/agent-guide

#	Language	What it does	Status	Notes
1	bash/curl	POST /parse with public URL	PASS	Ran with `https://pdfobject.com/pdf/sample.pdf` substituted for the `example.com/doc.pdf` placeholder; `response_type=parse`, pages=1.
2	python	Parse via URL + upload + chunk loop	PASS	URL parse + upload + chunk iteration all succeed.
3	python	Extract with JSON schema	PASS	Returns dict with both schema keys (values empty because test PDF has no invoice fields).
4	python	Split with split_description	PASS	Returns Split objects with `name` and `pages`.
5	python	Classify with categories	PASS	`Result(category='invoice')`.
6	python	Edit with `edit_instructions`	SKIP	Server returned 422 INVALID_SCHEMA "No form fields were detected in the PDF"; snippet is structurally correct but needs a form-bearing PDF fixture. Marked SKIP — missing setup.
7	python	Async submit + poll loop	PASS	Job reached `Completed`.

https://docs.reducto.ai/upload/overview

#	Language	What it does	Status	Notes
1	python	`client.upload(Path(...))` + parse	PASS	`file_id` returned, parse succeeds.
2	javascript	`client.upload({file: fs.createReadStream})` + parse	PASS	Works on Node 24.
3	bash/curl	Multipart upload	PASS	Returns valid `file_id`.
4	python	ThreadPoolExecutor batch upload	PASS	3 parallel uploads succeed.
5	python	URL passthrough `client.parse.run(input=url)`	PASS
6	javascript	URL passthrough JS	PASS
7	bash/curl	URL passthrough cURL	PASS

https://docs.reducto.ai/parse/overview

#	Language	What it does	Status
1	python	upload + parse + iterate chunks	PASS
2	javascript	upload + parse + iterate chunks	PASS
3	bash/curl	upload (jq) + parse	PASS
4	python	Variable chunking	PASS
5	python	Table HTML format	PASS
6	python	`summarize_figures` enhance	PASS
7	python	Agentic enhancements (text/table/figure)	PASS

https://docs.reducto.ai/parse/best-practices

#	Language	What it does	Status	Notes
1	python	Variable chunking + embedding_optimized	PASS
2	python	Agentic modes	PASS
3	python	Table HTML	PASS
4	python	`filter_blocks`	PASS
5	python	`run_job(..., async_config={"priority": True})`	FAIL	`TypeError: run_job() got an unexpected keyword argument 'async_config'`. SDK expects `async_=`. Diagnosis: docs use the wrong kwarg name. Suggested correction (not applied): use `async_={"priority": True}` as on `workflows/async-overview`.
6	javascript	`runJob({ asyncConfig: { priority: true } })`	PASS	Call returns a `job_id`, but JS SDK schema requires `async:`. Unknown key is silently forwarded; priority semantics not actually applied. PASS recorded because the snippet did not throw, but flag this as the same docs bug as #5.
7	bash/curl	`"options": {"priority": true}`	PASS	Server accepts and returns `job_id`; unknown `options` field is ignored, so priority is silently dropped. PASS for HTTP success only — same underlying docs bug as #5/#6.

https://docs.reducto.ai/workflows/async-overview

#	Language	What it does	Status	Notes
1	python	`run_job(..., async_={"priority": True})`	PASS
2	typescript	`runJob({ async: { priority: true } })`	PASS	Executed as JS.
3	bash/curl	parse_async with `"async": {"priority": true}`	PASS
4	python	submit + poll using `client.job.get(...)`	PASS	Reached `Completed`.
5	typescript	submit + poll using `client.job.retrieve(...)`	FAIL	`TypeError: client.job.retrieve is not a function`. JS SDK exposes `client.job.get`, `cancel`, `getAll`. Suggested correction (not applied): replace `client.job.retrieve` with `client.job.get`.
6	bash/curl	submit + poll via `/job/$JOB_ID`	PASS	Reached `Completed`.
7	python	`settings={"persist_results": True}`	PASS
8	bash/curl	persist_results cURL	PASS
9	python	async with metadata	PASS
10	bash/curl	async with metadata cURL	PASS

https://docs.reducto.ai/extract/overview

#	Language	What it does	Status	Notes
1	python	Schema-based extract	PASS	Returns `{'total_amount':0,'invoice_date':''}` (values absent in non-invoice fixture, but schema executes).

https://docs.reducto.ai/classify/overview

#	Language	What it does	Status
1	python	classify with three categories	PASS
2	javascript	classify (JS)	PASS
3	bash/curl	classify cURL	PASS
4	python	upload + classify + route to parse/extract	PASS

https://docs.reducto.ai/workflows/direct-webhooks

#	Language	What it does	Status	Notes
1	python	`run_job(async_={"webhook": {"mode":"direct", ...}})`	PASS	Returns job_id.
2	typescript	runJob direct webhook	PASS	Executed as JS.
3	bash/curl	parse_async direct webhook	PASS
4	python	Flask handler `/webhook`	SKIP	Server-side handler; requires a public HTTP listener to verify end-to-end — partial fragment.
5	python	Flask handler with metadata-secret validation	SKIP	Same — server-side handler.

https://docs.reducto.ai/workflows/svix-webhooks

#	Language	What it does	Status	Notes
1	python	`requests.post(/configure_webhook)`	PASS	Returns Svix dashboard URL (HTTP 200).
2	bash/curl	`POST /configure_webhook`	PASS	Returns Svix login URL with one-time token.
3	python	run_job with svix webhook mode	PASS
4	bash/curl	parse_async svix webhook	PASS
5	python	Svix signature-verifying Flask handler	SKIP	Server handler; cannot exercise without an inbound webhook test driver.

https://docs.reducto.ai/upload/large-files

#	Language	What it does	Status
1	python	Full presigned upload + parse flow	PASS
2	javascript	Full presigned upload + parse flow	PASS
3	bash	Full presigned upload + parse flow (jq)	PASS

https://docs.reducto.ai/workflows/batch-processing

#	Language	What it does	Status	Notes
1	python	AsyncReducto batch parse URLs	PASS
2	python	AsyncReducto batch parse local files	PASS	Used 3 sample PDFs in `/tmp/documents`.
3	python	ThreadPoolExecutor sync batch	PASS
4	python	AsyncReducto batch extract	PASS
5	python	AsyncReducto batch parse and save JSON	PASS

https://docs.reducto.ai/mcp-server

#	Language	What it does	Status	Notes
1	json	Hosted MCP server config	PASS	JSON parses.
2	json	Local MCP server config (Claude Desktop / Cursor)	PASS	JSON parses.
3	toml	Codex `~/.codex/config.toml`	PASS	`tomli` parses successfully.
4	bash	`uvx mcp-server-reducto --login` / `curl ... uv/install.sh \| sh` / `npx ... inspector`	SKIP	Interactive installers / OAuth flow — runtime/setup unavailable in this sandbox.

https://docs.reducto.ai/reference/version-pinning

#	Language	What it does	Status	Notes
1	python	parse with `settings.alpha.layout_model: "v2"`	PASS
2	javascript	same in JS	PASS
3	bash/curl	same as cURL	PASS
4	python/js/curl	Extract version pin variants with literal `{...}` schema	SKIP	Partial fragment — `instructions.schema` is shown as `{...}`; cannot run without inventing a schema.

Target history

Prior reports

Loading history.

Sources

Reducto Documentation Audit

1. Data retention policy contradicts itself across at least six pages (critical)

Problem: The same retention number is given four different ways:

/overview security card: "Documents deleted within 24h."
/agent-guide and /parse/overview: results "expire after 24h" (persist_results description).
/upload/overview: "Files expire 24 hours after upload."
/mcp-server: the reducto:// scheme is described as "A file in Reducto's temporary storage (24-hour TTL)."
/reference/faq, /reference/glossary, /workflows/async-overview, /workflows/direct-webhooks: "job results are deleted after 12 hours" / "Jobs are deleted after 12 hours."
/security/policies: "data submitted via API is set to expire within 24 hours" — and scopes ZDR to "Growth tier and above."
/security/eu-data-residency: "strict 24-hour maximum retention window … Jobs are purged automatically every 12 hours" (internally inconsistent on its own page).

2. Three different Python SDK kwarg names for the same async/priority option (critical)

Location: /workflows/async-overview, /parse/best-practices, /workflows/svix-webhooks, /workflows/direct-webhooks

Problem: The Python examples disagree about the kwarg name for async/priority/webhook config on client.parse.run_job(...):

/workflows/async-overview: async_={"priority": True}
/workflows/svix-webhooks: async_={"webhook": {...}, "metadata": {...}}
/parse/best-practices: async_config={"priority": True} (and asyncConfig in JS)
The cURL example on best-practices uses a top-level "options": {"priority": true} body, which matches neither Python form.

3. Parse-overview chunk modes don't match the linked "full chunking options" reference (significant)

Location: /parse/overview vs. /configs/parse/chunking-methods

4. ZDR scope contradicts itself: default policy vs. Growth-tier feature (critical)

Location: /reference/glossary, /overview vs. /security/policies, /enterprise/enterprise-readiness, /onprem/enterprise_deployment_options

5. HIPAA complaints page lists three contact addresses, including a mismatched mailto link (critical)

Location: /security/filing-complaints

6. Subprocessor lists disagree between general policies and EU residency (significant)

Location: /security/policies vs. /security/eu-data-residency

7. SaaS deployment page understates the SaaS stack relative to the subprocessor list (significant)

Location: /onprem/enterprise_deployment_options vs. /security/policies

8. On-prem fair-queueing docs assume a SaaS base URL without saying so (significant)

Location: /onprem/enterprise_deployment_options fair-queueing section

9. `llms.txt` index omits `/reference/model-versions` even though other docs link to it (significant)

Location: /llms.txt vs. /reference/version-pinning, /reference/model-versions

10. API reference pages in `llms-full.txt` are reduced to OpenAPI stubs (significant)

Problem: Every /api-reference/* page in the agent-readable bundle collapses to a one-line directive like:

# Parse
Source: https://docs.reducto.ai/api-reference/parse
/openapi.json post /parse

There is no parameter list, no example body, no description in the LLM-readable form. An agent fetching llms-full.txt to learn how to call /parse gets the URL and HTTP verb only.

11. Classify is declared sync-only but the rate-limits page doesn't list it (significant)

Location: /classify/overview vs. rate-limits page

12. Extract response shape unspecified for `array_extract` + citations combination (minor)

Location: /extract/overview

13. On-prem changelog gated behind a password but versions leak via llms-full.txt (minor)

Location: /onprem/changelog

The fix: Decide whether on-prem release notes are confidential. If yes, exclude /onprem/* pages from llms.txt/llms-full.txt. If no, remove the password protection on the live page.

14. On-prem file retention default (60 min) never reconciled with SaaS retention (minor)

Location: /onprem/file_cleanup vs. SaaS retention pages

15. Two "batch processing" pages with no stated relationship (minor)

Location: /workflows/batch-processing and /cookbooks/batch-processing

Consequence: Search results for "batch" surface both; developers can't tell whether they're alternatives, a tutorial vs. reference split, or one is obsolete.

The fix: Make one canonical and have the other open with a one-line "see also" pointer, or merge them. If they really are tutorial vs. reference, say so at the top of each.

What they do well

The /agent-guide page is a genuinely useful agent-first reference with parameter tables, error codes, and worked examples — most docs sites don't ship anything like it.
A real llms.txt exists alongside the OpenAPI spec, an MCP server, and a CLI; the surface area for programmatic consumption is unusually complete.
The page-billing breakdown page documents the legacy spreadsheet_cells field that may still appear in old responses — that kind of transition-state honesty is rare and helpful.

Top 3 recommendations

Pick one retention number and one ZDR scope, then mass-update. Findings #1 and #4 together affect at least eight pages including the security, overview, and MCP surfaces. Until this is resolved, every compliance review will surface contradictions.
Audit every Python and Node example for async_ / async_config / asyncConfig / options and pick one shape. Add a snippet-validation step in CI that imports the SDK and type-checks each example.
Regenerate llms.txt from the sitemap and inline OpenAPI parameter detail into the per-endpoint markdown so the agent-readable bundle isn't strictly worse than the rendered site for the most important endpoints.

Check out Manicule.

Reducto

Reducto Documentation Audit

1. Data retention policy contradicts itself across at least six pages (critical)

2. Three different Python SDK kwarg names for the same async/priority option (critical)

3. Parse-overview chunk modes don't match the linked "full chunking options" reference (significant)

4. ZDR scope contradicts itself: default policy vs. Growth-tier feature (critical)

5. HIPAA complaints page lists three contact addresses, including a mismatched mailto link (critical)

6. Subprocessor lists disagree between general policies and EU residency (significant)

7. SaaS deployment page understates the SaaS stack relative to the subprocessor list (significant)

8. On-prem fair-queueing docs assume a SaaS base URL without saying so (significant)

9. llms.txt index omits /reference/model-versions even though other docs link to it (significant)

10. API reference pages in llms-full.txt are reduced to OpenAPI stubs (significant)

11. Classify is declared sync-only but the rate-limits page doesn't list it (significant)

12. Extract response shape unspecified for array_extract + citations combination (minor)

13. On-prem changelog gated behind a password but versions leak via llms-full.txt (minor)

14. On-prem file retention default (60 min) never reconciled with SaaS retention (minor)

15. Two "batch processing" pages with no stated relationship (minor)

What they do well

Top 3 recommendations

Runtime snippet checks

Summary

Required credentials

Pages

Prior reports

Sources

Check out Manicule.

Reducto

Reducto Documentation Audit

1. Data retention policy contradicts itself across at least six pages (critical)

2. Three different Python SDK kwarg names for the same async/priority option (critical)

3. Parse-overview chunk modes don't match the linked "full chunking options" reference (significant)

4. ZDR scope contradicts itself: default policy vs. Growth-tier feature (critical)

5. HIPAA complaints page lists three contact addresses, including a mismatched mailto link (critical)

6. Subprocessor lists disagree between general policies and EU residency (significant)

7. SaaS deployment page understates the SaaS stack relative to the subprocessor list (significant)

8. On-prem fair-queueing docs assume a SaaS base URL without saying so (significant)

9. llms.txt index omits /reference/model-versions even though other docs link to it (significant)

10. API reference pages in llms-full.txt are reduced to OpenAPI stubs (significant)

11. Classify is declared sync-only but the rate-limits page doesn't list it (significant)

12. Extract response shape unspecified for array_extract + citations combination (minor)

13. On-prem changelog gated behind a password but versions leak via llms-full.txt (minor)

14. On-prem file retention default (60 min) never reconciled with SaaS retention (minor)

15. Two "batch processing" pages with no stated relationship (minor)

What they do well

Top 3 recommendations

Runtime snippet checks

Summary

Required credentials

Pages

Prior reports

Sources

9. `llms.txt` index omits `/reference/model-versions` even though other docs link to it (significant)

10. API reference pages in `llms-full.txt` are reduced to OpenAPI stubs (significant)

12. Extract response shape unspecified for `array_extract` + citations combination (minor)

9. `llms.txt` index omits `/reference/model-versions` even though other docs link to it (significant)

10. API reference pages in `llms-full.txt` are reduced to OpenAPI stubs (significant)

12. Extract response shape unspecified for `array_extract` + citations combination (minor)