Octen Documentation Audit
The Octen docs cover a real, multi-surface AI search platform (Search, Extract, Embedding, VL Embedding, Web Chat, Broad Search, Deep Research, Python SDK) and ship the right structural pieces — llms.txt, per-page .md mirrors, an OpenAPI spec, a changelog, an SLA, error codes, rate limits. But several load-bearing facts contradict each other across pages: free-tier definition, welcome-credit currency, data-retention claims, and the canonical URL for the Search API reference. Agents will hit these contradictions silently; humans will hit them within the first hour.
1. Welcome credit is $5 in USD on Pricing but ¥5 on the Changelog (critical)
Location: /introduction/getting-started/pricing.md and /introduction/resources/changelog.md
Problem: Pricing states "All new users receive $5 in free balance upon registration." The March 15, 2026 changelog entry for the same offer states "Every new user receives a ¥5 free balance upon registration." $5 USD and ¥5 (yuan ≈ $0.69, yen ≈ $0.03) are not the same amount by any plausible reading.
Consequence: Developers reading the changelog believe Octen is essentially giving them nothing (¥5 yuan won't cover a single 1,000-call Search batch at $5/1k); developers reading Pricing budget for a real onboarding allowance. Either group will be surprised by what actually appears in their console. Agents grounding answers on docs will produce inconsistent quotes depending on which page they cite.
The fix: Pick one currency and propagate it. If the amount is $5 USD, change the changelog to "$5 free balance." Sweep the rest of the docs for stray currency symbols.
2. Rate Limits invents a "Free" tier that doesn't exist anywhere else (critical)
Location: /introduction/admin/rate-limits.md vs. /introduction/getting-started/pricing.md vs. /sla.md
Problem: The Rate Limits page lists five subscription tiers — Free (5 QPS), Base (20), Pro (50), Scale (100), Enterprise. The Pricing page's canonical QPS Plans table lists only four — Base (20, Free), Pro, Scale, Enterprise — with no separate "Free" row. The SLA page reinforces Pricing's model by referring to "the free Base plan" as the excluded tier. There is no third page that documents what the 5 QPS "Free" tier is, how to be on it, or how it differs from the free Base plan at 20 QPS.
Consequence: A developer who reads Rate Limits first assumes they have 5 QPS until they upgrade; one who reads Pricing first assumes 20 QPS on day one. Capacity planning, load-test budgets, and 429-handling decisions all hinge on this number. An agent answering "what's my QPS limit?" will give different answers depending on which page it retrieved.
The fix: If "Free" and "Base" are the same tier, delete the "Free" row from the Rate Limits table. If they are different (e.g., pre-payment-method vs. post-payment-method), document the distinction explicitly on both Pricing and Rate Limits, and align the SLA's wording.
3. "API Reference" links to /api-reference/web-search, which 404s (critical)
Location: /llm-&-agent-interfaces/llm-tool-use/connect-llms-to-octen-search.md and /llm-&-agent-interfaces/docs-for-llms/using-with-llms.md
Problem: The "Connect LLMs to Octen Search" page has a prominent CardGroup whose "API Reference" card points to /api-reference/web-search. The "Using with LLMs" page tells LLM consumers that "the Web Search API reference can be found at https://docs.octen.ai/api-reference/web-search.md." Both URLs 404 (verified). The actual page lives at /api-reference/search (verified in llms.txt and in the OpenAPI). This is the same dead path referenced from two different pages, indicating an old route was renamed and references were never swept.
Consequence: The dead link is on the page Octen tells LLM integrators to read first ("Connect LLMs to Octen Search") and on the page that explicitly instructs LLMs how to fetch reference docs. Agents following the documented .md convention will silently hit 404 and answer from stale context or refuse to answer at all. Human developers click a "View the full Web Search API reference" card and land on Not Found.
The fix: Update both pages to point to /api-reference/search. Add a 301 from /api-reference/web-search → /api-reference/search so any external link, blog post, or cached agent retrieval keeps working.
4. FAQ Q10 says queries are "never stored," but Security & Compliance documents encryption-at-rest and deletion-on-request (significant)
Location: /introduction/resources/faqs.md vs. /introduction/resources/security-&-compliance.md
Problem: FAQ Q10 answers "Will my queries or data be used for model training?" with "No. Your queries and data are never stored or used for training." The Security & Compliance page states customer data is "protected through TLS 1.2+ encryption in transit and AES-256 at rest, with deletion available upon request." If queries are never stored, there is nothing to encrypt at rest and nothing to delete on request. The two claims cannot both be operationally true.
Consequence: This is a compliance and trust contradiction. Enterprise procurement and privacy reviewers care about exactly which claim is correct — GDPR right-to-erasure language only applies to data that is stored, and "never stored" is the kind of absolute that a DPA will hold the vendor to. Developers writing privacy notices in their own apps will quote the wrong sentence. Agents asked "does Octen retain my queries?" will give opposite answers depending on which page they retrieved.
The fix: Reconcile the language. If queries are retained transiently (logs, abuse detection, billing) and then deleted, say so explicitly on both pages with retention windows. If they are truly never written to disk, remove the "AES-256 at rest" and "deletion upon request" wording from Security & Compliance and explain how the "no storage" guarantee is enforced.
5. Web Chat model enum omits qwen/qwen3.6-plus even though Pricing lists it (significant)
Location: /api-reference/web-chat.md vs. /introduction/getting-started/pricing.md
Problem: The Pricing page's "Web Chat & Broad Search & Deep Research" table prices qwen/qwen3.6-plus at $0.5 / $3 per 1M tokens. The Web Chat reference's "Supported Models" list enumerates ten models and qwen/qwen3.6-plus is not among them. Either it is supported (and the reference is wrong) or it isn't (and the pricing row is misleading).
Consequence: A developer who picks Qwen based on its competitive pricing will get an "unsupported model" error from /v1/chat/completions. Agents that key model selection on the pricing table will route traffic to a model the endpoint rejects.
The fix: Reconcile the two lists. If Qwen is supported, add it to the Web Chat enum (and Broad Search / Deep Research enums by transitive claim); if not, remove the pricing row or annotate it with the actual endpoints where it works.
6. Deep Research documents a reasoning enum; Web Chat documents none (significant)
Location: /api-reference/deep-research.md vs. /api-reference/web-chat.md
Problem: Deep Research documents reasoning as one of none, low, medium, high. Web Chat — which shares the same model menu and is the more commonly used endpoint — describes "Reasoning Models" only narratively ("thinking processes appearing as <think>...</think> tags in responses") and does not specify whether a reasoning parameter is accepted, what values are valid, or what the default is.
Consequence: A developer who learns the parameter on Deep Research and tries to reuse it on Web Chat has no schema to validate against — they either pass a value that gets silently ignored or hit a 400 they can't predict. Agents and code generators can't produce typed bindings for Web Chat's reasoning behavior because there is no contract to bind to.
The fix: Either document a reasoning parameter on Web Chat with the same (or different, but explicit) enum as Deep Research, or state explicitly that Web Chat does not accept a reasoning parameter and that reasoning behavior is determined by model selection alone. Reflect whichever answer in the OpenAPI spec.
7. FAQ Q6 says Search API "only supports single-query requests" — directly contradicting Broad Search and Deep Research (significant)
Location: /introduction/resources/faqs.md vs. /api-reference/broad-search.md and /api-reference/deep-research.md
Problem: FAQ Q6 answers the batching question with: "Embedding API supports batch input. Search API only supports single-query requests." But Broad Search is described as a first-class product that "automatically decomposes user messages into multiple sub-queries, performs searches, and synthesizes results," with max_queries up to 30. Deep Research does the same across rounds. Both are listed in llms.txt as Octen's own API surface.
Consequence: A developer reading the FAQ concludes they need to fan out queries client-side and re-implement what Broad Search already does — a wasted day of integration work, plus the higher latency and cost of running N serial requests instead of one. Agents answering "does Octen support multi-query search?" will say no based on the FAQ.
The fix: Rewrite Q6 to acknowledge multi-query support via Broad Search and Deep Research, with a one-line pointer to those references. If the original sentence was specifically about the low-level POST /search endpoint, scope it that way explicitly ("the single /search endpoint accepts one query per request; use Broad Search for multi-query decomposition").
8. Embedding default model in spec doesn't match the "Single Input" example (significant)
Location: /api-reference/embedding.md
Problem: The EmbeddingRequest.model schema declares default: octen-embedding-4b. The page's own "Single Input" example sets model: octen-embedding-8b with dimension: 4096, while the "Batch Input" example uses octen-embedding-0.6b. Neither example exercises the documented default, and the more prominent single example silently overrides it.
Consequence: Developers copying the single-input example will be billed at the 8B rate ($0.07 / 1M tokens) instead of the documented default 4B rate ($0.04) — the 8B model is 75% more expensive per token, a cost premium they didn't ask for. Agents that learn "the default model is 8B with 4096 dims" from the example will misreport pricing and capability defaults.
The fix: Either change the first example to use the actual default (octen-embedding-4b, omit dimension) and add a second example that explicitly demonstrates upgrading to 8B, or annotate the example with "This example overrides the default model to 8B for accuracy-critical workloads (cost: $0.07/1M tokens vs. $0.04 for the default)."
9. input_type documentation only specifies the query prompt; document and null are conflated (minor)
Location: /api-reference/embedding.md
Problem: The input_type description states: "query → 'Represent the query for retrieving supporting' (prepended and counted in input_tokens); document and null mean no special prompt is applied." Two issues: (1) the query prompt string is itself cut off mid-sentence ("retrieving supporting" — supporting what?); (2) collapsing document and null into the same behavior makes the parameter's distinction between them undefined — why expose both if they're identical?
Consequence: Developers don't know whether to set input_type: "document" for corpus indexing or leave it null, and can't audit whether the retrieval-side prompt was applied because they don't have its full text. Vector-store ingestion pipelines built on Octen will be inconsistent across teams.
The fix: Publish the full query prompt string. Either define a distinct behavior for document vs. null or deprecate one of them. State which input_type is required to match retrieval-time embeddings against document-time embeddings for correct similarity.
10. Error Codes table includes 413, but no per-endpoint reference documents a payload size limit that would produce it (minor)
Location: /introduction/admin/error-codes.md vs. all /api-reference/* pages
Problem: The global Error Codes page lists 413 Request Entity Too Large with the cause "Request payload exceeds the allowed size limit." Only the Embedding spec mentions a concrete payload cap ("Maximum request body size: 2MB"); Extract documents per-item limits (20 URLs, 2048 chars per URL) but no overall payload cap; Search, Web Chat, Broad Search, and Deep Research document none. The error is enumerated globally but the threshold to avoid it is documented for only one endpoint.
Consequence: A developer hitting 413 on /search or /v1/chat/completions has no documented threshold to test against; they have to bisect their request size to discover the limit empirically. Agents that try to right-size requests preemptively have no number to clamp to.
The fix: Add a "Limits" line to each endpoint's reference page with the maximum request body size (and per-field caps where relevant), and link from each one back to Error Codes 413.
11. Search format default is text, but Quickstart and search-optimization guide never mention it (minor)
Location: /api-reference/search.md vs. /introduction/getting-started/quickstart.md and /examples/guides/search-optimization.md
Problem: The Search OpenAPI spec defines format as enum: [markdown, text] with default: text. The search-optimization guide recommends a "highlight-first" pattern as the primary cost-optimization strategy but never tells readers that highlight strings come back as plain text unless they opt into format: markdown. The Quickstart doesn't mention the parameter either.
Consequence: Developers wiring Octen highlights into a Markdown-rendered UI (chat answer cards, RAG citations) will get unstyled plain strings — no bold, no links — and won't know why until they read the Search reference end-to-end. Agents generating UI code from the guide will produce broken renderers.
The fix: Mention format: markdown once in the Quickstart and once in the search-optimization guide where highlights are introduced, with a one-line note that the default is plain text.
12. Founding year and changelog dates disagree on the timeline (minor)
Location: /introduction/getting-started/who-is-octen.md and /introduction/resources/changelog.md
Problem: "Who is Octen" states the company was "Founded in 2025." The changelog's earliest entry is February 10, 2026 ("Octen Beta Now Open"), with Public Beta in March 2026 and Deep Research shipping in April 2026. There are no 2025 entries, so the entire public history of the product post-dates the founding by a year with no documented activity in between.
Consequence: Prospective enterprise customers doing due diligence wonder what happened during 2025 — stealth, pivot, pre-product? Doc reviewers can't tell whether the changelog is incomplete (missing 2025 entries) or whether "Founded in 2025" should read "late 2025."
The fix: Either backfill the 2025 changelog (private beta, model releases, the Octen-8B RTEB win mentioned in "Who is Octen") or tighten the founding sentence to "Founded in late 2025" so the timeline lines up.
13. Octen-8B's RTEB leaderboard claim links to a blog under a different org name (minor)
Location: /introduction/getting-started/who-is-octen.md
Problem: The page credits "Octen-8B successfully sweeping the RTEB leaderboard in early 2026" and links Octen-8B to https://huggingface.co/Octen. The GitHub org in llms.txt is github.com/Octen-Team, the blog is on octen-team.github.io, and the HF org is huggingface.co/Octen. Three different handles for what appears to be the same entity.
Consequence: Agents and humans trying to verify the benchmark claim or pull the model weights are guessing at the canonical org handle. Trust signal: weaker than it needs to be for a "SOTA" claim.
The fix: Standardize on one org name across HF, GitHub, and the blog (or document the mapping prominently on "Who is Octen"). Link directly to the RTEB leaderboard itself, not only to a self-published blog post about it.
14. Python SDK entry point is client.search.search() — the namespace is doubled (minor)
Location: /api-&-sdks/sdks/python.md
Problem: The Python SDK page describes the primary search call as client.search.search() — a search resource namespace containing a search method. There is no documented reason the resource and method share a name (compare typical SDK shapes like client.search.query() or client.web.search()). The page also offers simple_search() as a "quick lookup" alternative, but the canonical entrypoint remains the doubled form.
Consequence: Developers reading the SDK in their IDE see autocomplete suggest client.search.search(...) and assume a typo or a missing import; readability and discoverability of the main API call suffer. Agents generating SDK examples will either parrot the awkward form or guess at a cleaner one.
The fix: Either flatten the resource (client.search(...)), rename the method (client.search.query(...)), or document explicitly why the namespace is doubled (e.g., "search" resource groups search + multi-query + history).
15. VL Embedding element math doesn't reconcile (minor)
Location: /api-reference/vl-embedding.md
Problem: The VL Embedding limits state: "Maximum 20 total elements per request; Maximum 5 images per request; Maximum 1 video per request." 5 + 1 = 6, leaving 14 slots that — if not text — are unaccounted for. The page never states that text entries count toward the 20-element total, nor whether the 20-element ceiling applies when all inputs are text.
Consequence: A developer planning a fusion request can't tell whether 20 - images - video = text_capacity or whether the 20 is a hard cap shared across modalities. Request-sizing logic and batching code will be written defensively (assuming the worst) and waste throughput.
The fix: State explicitly: "The 20-element total includes text entries. A request may contain up to 20 text entries, up to 5 images, and up to 1 video, with the sum not exceeding 20." Add an example showing a maxed-out fusion request.
What they do well
llms.txtis real and complete — every documented page is enumerated, and the.mdmirror convention is explicitly described onusing-with-llms.md. That's more than most docs sites of comparable age ship.- Error semantics are structured — global Error Codes table with status, cause, and remediation; partial-failure schema in Extract (
status: failed,error_message, "failed items are not billed") is the kind of contract agents can actually code against. - SLA is honest about scope — explicitly excludes the free Base plan and pay-as-you-go from Service Credits, defines downtime as 5xx rate above 5% over 5 minutes, and excludes 4xx from the calculation. That's a real SLA, not marketing.
Top 3 recommendations
- Fix the canonical Search API URL everywhere — rename references from
/api-reference/web-searchto/api-reference/search, add a 301, and re-verify every CardGroup link in the LLM-integration section. - Reconcile the "Free" vs "Base" tier story and the data-retention story in the same editing pass — these are the two contradictions with the largest blast radius (capacity planning and privacy/compliance). Pick one truth for each and propagate it through Pricing, Rate Limits, SLA, FAQ, and Security & Compliance.
- Sweep for cross-page contradictions on identifiers and amounts — welcome credit currency ($5 vs ¥5), Qwen presence in the Web Chat enum,
reasoningparameter coverage, and Embedding default model in examples. Each is a small fix individually; collectively they're what make an agent give different answers depending on which page it retrieved.