Unsiloed AI

Unsiloed AI Documentation Audit

The docs cover a real product with multiple API surfaces (parse v1/v2/v3, extract, classify, split), but parallel versions disagree on auth header, status enums, and defaults — and the agent-facing llms.txt index points at a generic dummy OpenAPI spec.

1. `llms.txt` lists a dummy fixture as one of four "OpenAPI Specs" (critical)

Location: https://docs.unsiloed.ai/llms.txt → https://docs.unsiloed.ai/tooling/tests/fixtures/sample_spec.json

Problem: The ## OpenAPI Specs section in llms.txt advertises four specs to AI agents. One of them, sample_spec, resolves to a publicly-served fixture file whose contents are a generic "title": "Sample API" placeholder with a single POST /sample operation and a name/verbose body — i.e., nothing to do with Unsiloed's parse/extract/classify/split surfaces.

Consequence: An AI agent crawling llms.txt to discover the API will treat this file as one of four authoritative OpenAPI documents and may try to call POST /sample with {name, verbose} against prod.visionapi.unsiloed.ai. At best it wastes an agent step; at worst it pollutes generated client code. It also leaks an internal test-fixtures path (/tooling/tests/fixtures/) into a public index.

The fix: Remove the sample_spec line from llms.txt, unpublish the fixture from the docs site (it shouldn't be reachable at docs.unsiloed.ai at all), and leave only openapi, openapi-v1, openapi-v2 as the listed specs. Add a v3 OpenAPI spec entry too.

2. Three parse endpoints use three different auth headers and status vocabularies (critical)

Location: /api-reference/parser/parse-document.md, /api-reference/parser/parse-document-v2.md, /api-reference/parser/parse-document-v3.md, /api-reference/parser/get-parse-job-status.md

Problem: The docs document three parallel parse APIs that disagree on the two most fundamental things in a REST contract:

Auth header: v1/v2 (and the homepage quickstart) require api-key: your-api-key. v3 explicitly requires X-API-Key and states "v3 API keys are issued on request — they are separate from v1/v2 keys."
Status enum: get-parse-job-status.md enumerates Starting, Processing, Succeeded, Failed, Cancelled (PascalCase). v2's note adds AwaitingUpload and Queued to the lifecycle. v3 documents "queued" → "running" → "done" or "failed" (lowercase, different verbs entirely). Classification status uses yet another set: "processing", "completed", "failed". Jobs results uses "COMPLETED" (uppercase).

Consequence: A developer or agent that polls with if result["status"] == "Succeeded" (the exact pattern shown on the homepage) will loop forever against v3, which never emits Succeeded — it emits done. The same agent will get 401/403 if it carries an api-key header to v3 instead of X-API-Key. There's no cross-version compatibility note anywhere.

The fix: Add a "Versions and migration" page at /api-reference/parser/ that tabulates header, status enum, retention, rate limits, and base path for v1/v2/v3 side by side. Standardize status casing across the product, or at minimum mark each status enum with the version it applies to. Make v3's separate-key requirement a banner on the v3 page, not buried in a Note.

3. `ocr_engine` default flips between v1 and v2 with no migration note (critical)

Location: /api-reference/parser/parse-document.md vs /api-reference/parser/parse-document-v2.md

Problem: Same parameter, opposite defaults:

v1: "UnsiloedBeta" (default): Handles rotated/warped text and irregular bounding boxes.
v2: "UnsiloedHawk" (Recommended, default): Higher accuracy for complex layouts and mixed content.

The descriptions of each engine are also rewritten between versions with no changelog entry explaining why.

Consequence: A developer who has tuned a v1 pipeline around UnsiloedBeta's rotated-text behavior and migrates to v2 without re-reading parameter docs will silently get different OCR output, then chase the regression. Agents synthesizing requests from one page won't know to override the default on the other.

The fix: Either standardize the default across v1 and v2, or add an explicit "Changed in v2: default OCR engine is now UnsiloedHawk (was UnsiloedBeta)" note at the top of parse-document-v2.md and a corresponding migration page.

4. Generic "jobs/results" page documents extraction-only behavior as if universal (significant)

Location: /api-reference/jobs/results.md

Problem: The page is titled "Get Extraction Result" in llms.txt but lives under /api-reference/jobs/results and is written in generic language: "The Get Job Results endpoint retrieves the processed data from a completed job... after confirming the job status is COMPLETED." Nothing on the page reconciles "COMPLETED" with the v1/v2 Succeeded enum, the v3 done enum, or the classification completed enum.

Consequence: A developer landing on /api-reference/jobs/results (the URL implies a unified job-results endpoint) will assume it works for any job type and gate their polling on status == "COMPLETED", which matches none of the actual job-status responses documented for parse, classify, or v3.

The fix: Rename the route to /api-reference/extraction/results to match its actual scope, fix the status value to match what the extraction endpoint actually returns, and link to per-product result endpoints for parse/classify/split rather than implying a unified one.

5. `llms.txt` has an empty-titled entry pointing at the home page (significant)

Location: https://docs.unsiloed.ai/llms.txt

Problem: Inside the ## Docs list there is a literal line - [](https://docs.unsiloed.ai/index.md) — no title, no description. Every other entry in that section has both.

Consequence: Agents ranking pages by title/description (which is what llms.txt is designed for) will deprioritize or skip the actual landing page because it looks like a broken stub. The home page is where the API base URL and auth header are defined.

The fix: Replace with - [Introduction](https://docs.unsiloed.ai/index.md): What Unsiloed AI is, the API base URL, and the auth header. Or whatever short blurb matches the page.

6. Sign-up flow is a Cal.com 15-minute call, not self-serve (significant)

Location: /index.md ("Get API Access" step) and /faq/general.md ("How do I get started?")

Problem: Both pages link "Sign up on Unsiloed AI" to https://cal.com/aman-mishra-p0ry57/15min — a sales-call booking, not a sign-up form. The FAQ then says "We'll provide you with API keys" as step 2. For v3 specifically, the docs additionally say keys are "issued on request — email aman@unsiloed.ai... typical turnaround is same-day."

Consequence: The docs are written as if onboarding is self-serve ("Make Your First API Call" follows immediately), but a developer cannot actually obtain an API key without a sales call or a personal email. Agents that follow the Getting Started steps will hit a dead end at step 1.

The fix: Either ship a real self-serve sign-up page and link to it, or make the gating explicit on the home page: "API access is currently gated — book a 15-minute call or email aman@unsiloed.ai to receive keys." Don't present a sales call as a sign-up form.

7. Split-document Python example mixes two unrelated response shapes (significant)

Location: /api-reference/splitting/split-document.md (Python request example)

Problem: The example prints result['job_id'], result['status'], result['message'], result['quota_remaining'] — i.e., the 202-accepted shape — and then in the same if block prints file_info['confidence_score'] and file_info['fileId'], which are fields from the eventual completed-job result returned by get-split-status. There's no loop, no file_info variable defined, and the indentation is broken (the two file_info lines are indented an extra level under nothing).

Consequence: Copy-pasting the snippet — exactly what an agent would do — produces a NameError: name 'file_info' is not defined and an IndentationError. The error message about "Classes parameter is required" further down also doesn't match the documented body parameter, which is called categories, not classes.

The fix: Rewrite the example to only print fields from the 202 response, and link to the polling example on get-split-status for the completed shape. Fix the error response to reference categories (or rename the parameter if classes is the canonical name on the server).

8. `segment_processing` / `segment_analysis` alias has undefined precedence wording (minor)

Location: /api-reference/parser/parse-document.md

Problem: The doc says segment_processing is "Alias for segment_analysis. If both are provided, segment_processing takes precedence." But the alias is documented as a separate ParamField of type string and nothing explains why a caller would ever send both, or what "takes precedence" means when one is a JSON object string and one is the canonical name.

Consequence: Agents synthesizing requests can't tell which name to use; some will send both fields to be safe, doubling payload size and risking server-side parse mismatches if the values diverge.

The fix: Pick one canonical name, mark the other as deprecated with a removal date, and stop documenting it as an active parameter.

9. v3 retention and rate limits are isolated to one page (minor)

Location: /api-reference/parser/parse-document-v3.md ("Guarantees" table; "100 requests/day, 2 RPS")

Problem: v3 documents 24-hour retention, 100 requests/day, and 2 RPS rate limits in a Guarantees table. v1/v2 parse pages, extract, classify, split, and the org usage page contain no equivalent table — rate limits and retention for those surfaces are not stated anywhere in the scraped docs.

Consequence: A developer building on v1/v2 has no documented SLA for how long their job artifacts persist, no published RPS ceiling, and no way to predict 429s. The organization usage endpoint exists but the docs don't tie its remaining quota to a specific limit number.

The fix: Add a "Limits & retention" matrix on each API-reference index page (or as a shared /api-reference/limits page) covering all four product lines, not just v3.

What they do well

Per-product API reference pages exist for every surface (parse, extract, classify, split), each with request/response schemas and error examples.
An llms.txt exists at all, with descriptions on most entries — agent-friendliness is at least on the radar.
v3 page is unusually explicit about per-key isolation, 24-hour retention, and exact rate limits — that level of contract is the right model to extend to v1/v2.

Top 3 recommendations

Fix llms.txt first. Remove the dummy sample_spec.json, give index.md a real title, and add a v3 spec entry. This is the cheapest agent-facing win.
Publish a versions matrix for the parse API covering header (api-key vs X-API-Key), status enum casing, base path, retention, and rate limits side by side. Today a developer has to read three pages and infer the diff.
Standardize status enums across parse/extract/classify/split, or at minimum annotate every status field with the exact string values it can return for that endpoint and version — no more Succeeded vs done vs COMPLETED vs completed drift.

Unsiloed AI Documentation Audit

1. `llms.txt` lists a dummy fixture as one of four "OpenAPI Specs" (critical)

Location: https://docs.unsiloed.ai/llms.txt → https://docs.unsiloed.ai/tooling/tests/fixtures/sample_spec.json

2. Three parse endpoints use three different auth headers and status vocabularies (critical)

Location: /api-reference/parser/parse-document.md, /api-reference/parser/parse-document-v2.md, /api-reference/parser/parse-document-v3.md, /api-reference/parser/get-parse-job-status.md

Problem: The docs document three parallel parse APIs that disagree on the two most fundamental things in a REST contract:

Auth header: v1/v2 (and the homepage quickstart) require api-key: your-api-key. v3 explicitly requires X-API-Key and states "v3 API keys are issued on request — they are separate from v1/v2 keys."
Status enum: get-parse-job-status.md enumerates Starting, Processing, Succeeded, Failed, Cancelled (PascalCase). v2's note adds AwaitingUpload and Queued to the lifecycle. v3 documents "queued" → "running" → "done" or "failed" (lowercase, different verbs entirely). Classification status uses yet another set: "processing", "completed", "failed". Jobs results uses "COMPLETED" (uppercase).

3. `ocr_engine` default flips between v1 and v2 with no migration note (critical)

Location: /api-reference/parser/parse-document.md vs /api-reference/parser/parse-document-v2.md

Problem: Same parameter, opposite defaults:

v1: "UnsiloedBeta" (default): Handles rotated/warped text and irregular bounding boxes.
v2: "UnsiloedHawk" (Recommended, default): Higher accuracy for complex layouts and mixed content.

The descriptions of each engine are also rewritten between versions with no changelog entry explaining why.

4. Generic "jobs/results" page documents extraction-only behavior as if universal (significant)

Location: /api-reference/jobs/results.md

5. `llms.txt` has an empty-titled entry pointing at the home page (significant)

Location: https://docs.unsiloed.ai/llms.txt

Problem: Inside the ## Docs list there is a literal line - [](https://docs.unsiloed.ai/index.md) — no title, no description. Every other entry in that section has both.

The fix: Replace with - [Introduction](https://docs.unsiloed.ai/index.md): What Unsiloed AI is, the API base URL, and the auth header. Or whatever short blurb matches the page.

6. Sign-up flow is a Cal.com 15-minute call, not self-serve (significant)

Location: /index.md ("Get API Access" step) and /faq/general.md ("How do I get started?")

7. Split-document Python example mixes two unrelated response shapes (significant)

Location: /api-reference/splitting/split-document.md (Python request example)

8. `segment_processing` / `segment_analysis` alias has undefined precedence wording (minor)

Location: /api-reference/parser/parse-document.md

The fix: Pick one canonical name, mark the other as deprecated with a removal date, and stop documenting it as an active parameter.

9. v3 retention and rate limits are isolated to one page (minor)

Location: /api-reference/parser/parse-document-v3.md ("Guarantees" table; "100 requests/day, 2 RPS")

The fix: Add a "Limits & retention" matrix on each API-reference index page (or as a shared /api-reference/limits page) covering all four product lines, not just v3.

What they do well

Per-product API reference pages exist for every surface (parse, extract, classify, split), each with request/response schemas and error examples.
An llms.txt exists at all, with descriptions on most entries — agent-friendliness is at least on the radar.
v3 page is unusually explicit about per-key isolation, 24-hour retention, and exact rate limits — that level of contract is the right model to extend to v1/v2.

Top 3 recommendations

Fix llms.txt first. Remove the dummy sample_spec.json, give index.md a real title, and add a v3 spec entry. This is the cheapest agent-facing win.
Publish a versions matrix for the parse API covering header (api-key vs X-API-Key), status enum casing, base path, retention, and rate limits side by side. Today a developer has to read three pages and infer the diff.
Standardize status enums across parse/extract/classify/split, or at minimum annotate every status field with the exact string values it can return for that endpoint and version — no more Succeeded vs done vs COMPLETED vs completed drift.

Check out Manicule.

Unsiloed AI Documentation Audit

1. `llms.txt` lists a dummy fixture as one of four "OpenAPI Specs" (critical)

2. Three parse endpoints use three different auth headers and status vocabularies (critical)

3. `ocr_engine` default flips between v1 and v2 with no migration note (critical)

4. Generic "jobs/results" page documents extraction-only behavior as if universal (significant)

5. `llms.txt` has an empty-titled entry pointing at the home page (significant)

6. Sign-up flow is a Cal.com 15-minute call, not self-serve (significant)

7. Split-document Python example mixes two unrelated response shapes (significant)

8. `segment_processing` / `segment_analysis` alias has undefined precedence wording (minor)

9. v3 retention and rate limits are isolated to one page (minor)

What they do well

Top 3 recommendations

Prior reports

Sources

Check out Manicule.

Unsiloed AI

Unsiloed AI Documentation Audit

1. `llms.txt` lists a dummy fixture as one of four "OpenAPI Specs" (critical)

2. Three parse endpoints use three different auth headers and status vocabularies (critical)

3. `ocr_engine` default flips between v1 and v2 with no migration note (critical)

4. Generic "jobs/results" page documents extraction-only behavior as if universal (significant)

5. `llms.txt` has an empty-titled entry pointing at the home page (significant)

6. Sign-up flow is a Cal.com 15-minute call, not self-serve (significant)

7. Split-document Python example mixes two unrelated response shapes (significant)

8. `segment_processing` / `segment_analysis` alias has undefined precedence wording (minor)

9. v3 retention and rate limits are isolated to one page (minor)

What they do well

Top 3 recommendations

Prior reports

Sources

Check out Manicule.

Unsiloed AI

Unsiloed AI Documentation Audit

1. llms.txt lists a dummy fixture as one of four "OpenAPI Specs" (critical)

2. Three parse endpoints use three different auth headers and status vocabularies (critical)

3. ocr_engine default flips between v1 and v2 with no migration note (critical)

4. Generic "jobs/results" page documents extraction-only behavior as if universal (significant)

5. llms.txt has an empty-titled entry pointing at the home page (significant)

6. Sign-up flow is a Cal.com 15-minute call, not self-serve (significant)

7. Split-document Python example mixes two unrelated response shapes (significant)

8. segment_processing / segment_analysis alias has undefined precedence wording (minor)

9. v3 retention and rate limits are isolated to one page (minor)

What they do well

Top 3 recommendations

Prior reports

Sources

Check out Manicule.

Unsiloed AI

Unsiloed AI Documentation Audit

1. llms.txt lists a dummy fixture as one of four "OpenAPI Specs" (critical)

2. Three parse endpoints use three different auth headers and status vocabularies (critical)

3. ocr_engine default flips between v1 and v2 with no migration note (critical)

4. Generic "jobs/results" page documents extraction-only behavior as if universal (significant)

5. llms.txt has an empty-titled entry pointing at the home page (significant)

6. Sign-up flow is a Cal.com 15-minute call, not self-serve (significant)

7. Split-document Python example mixes two unrelated response shapes (significant)

8. segment_processing / segment_analysis alias has undefined precedence wording (minor)

9. v3 retention and rate limits are isolated to one page (minor)

What they do well

Top 3 recommendations

Prior reports

Sources

1. `llms.txt` lists a dummy fixture as one of four "OpenAPI Specs" (critical)

3. `ocr_engine` default flips between v1 and v2 with no migration note (critical)

5. `llms.txt` has an empty-titled entry pointing at the home page (significant)

8. `segment_processing` / `segment_analysis` alias has undefined precedence wording (minor)

1. `llms.txt` lists a dummy fixture as one of four "OpenAPI Specs" (critical)

3. `ocr_engine` default flips between v1 and v2 with no migration note (critical)

5. `llms.txt` has an empty-titled entry pointing at the home page (significant)

8. `segment_processing` / `segment_analysis` alias has undefined precedence wording (minor)