HydraDB Documentation Audit
The reference is structurally complete (tenants, ingestion, recall, memories, webhooks all have pages), but multiple high-impact contradictions across pages, a broken auth example, and undocumented rate limits leave both humans and coding agents reasoning over conflicting source-of-truth.
1. additional_metadata filterability contradicts itself across four pages (critical)
Location: /api-reference/endpoint/upload-knowledge, /essentials/knowledge, /essentials/architecture, /essentials/memories
Problem: The same field is documented two opposite ways. In /api-reference/endpoint/upload-knowledge under file_metadata: "additional_metadata — object — Document-level metadata. Filterable in recall queries." But in /essentials/knowledge for the identical structure: "additional_metadata — Free-form fields for display and bookkeeping. Not filterable." And /essentials/architecture agrees with the second: "Flexible source-level metadata attached to a source; not matchable at recall time." /essentials/memories also says "Not matchable." Even within upload-knowledge itself, app_knowledge says additional_metadata is "not filterable at recall time" while file_metadata says it is filterable.
Consequence: A developer (or an agent) reading the API reference will write recall queries that filter on additional_metadata, then debug why filters return empty results because the recall engine actually only filters tenant-level metadata. The /api-reference/endpoint/full-recall page confirms the engine matches against additional_metadata only inside a nested additional_metadata filter — but that nuance is invisible if you trusted the upload-knowledge field table.
The fix: Pick one contract. Either (a) make additional_metadata not filterable everywhere and remove the "filterable" wording from upload-knowledge, or (b) document the actual nested-key filter mechanism uniformly across all four pages, with one canonical example. Don't have the file-metadata table and the recall-filters page disagree.
2. Tenant provisioning time contradicts itself on the same endpoint page (critical)
Location: /api-reference/endpoint/infra-status (and mirrored in /get-started/quickstart)
Problem: Two adjacent paragraphs on the same page give different answers. First: "Tenant creation typically takes 1–2 minutes for Free and Ship plans. For Enterprise plans, provisioning can take 4–5 minutes due to complete physical isolation." Then, under "Polling pattern": "Typical provisioning time: 10–30 seconds." That's a 10x gap between two statements about the exact same operation.
Consequence: Developers building polling logic don't know whether to expect 30 seconds or 5 minutes. An agent generating retry/backoff code will either time out prematurely (using 30s) or wait far too long before surfacing errors (using 5 min). Either way the first integration looks broken.
The fix: Replace the "10–30 seconds" line with the per-plan numbers, or split them clearly: "Cold-path provisioning: 1–2 min (Free/Ship), 4–5 min (Enterprise). Warm-path repeat polls: 10–30 s." Stop quoting two different SLAs three paragraphs apart.
3. Authorization: Bearer example is missing its placeholder (critical)
Location: /api-reference/index
Problem: The Authentication line reads literally: "Every endpoint requires Authorization: Bearer " — ending with a trailing space and nothing else. There is no <API_KEY>, $HYDRADB_API_KEY, or your-api-key-here token shown. The 401 error reference page also references "Missing Authorization: Bearer header" with the same dangling format.
Consequence: Copy-paste from the reference produces a header with no token. Coding agents that extract auth patterns by regex (Bearer\s+\S+) will not match, and may emit a header with literally Bearer and an empty token. The 401 page can't tell users they sent an empty token because the canonical example is an empty token.
The fix: Render the placeholder explicitly — Authorization: Bearer <HYDRADB_API_KEY> — and use that exact form on every page that mentions auth. Make sure your Mintlify component for the API key isn't being stripped from the rendered HTML.
4. sub_tenant_id default behavior differs between Memories and Knowledge (critical)
Location: /essentials/knowledge, /essentials/memories, /api-reference/endpoint/verify-processing
Problem: Three different defaults for omitting sub_tenant_id:
/essentials/knowledge: "If omitted, knowledge is available to all sub-tenants."/essentials/memories: "If omitted, HydraDB uses the tenant's default sub-tenant."/api-reference/endpoint/verify-processing: "If omitted or null, the default sub-tenant is used."
Consequence: This is a tenancy/isolation contract. A developer who reads the memories page and assumes the same default applies to knowledge ingestion will accidentally upload knowledge that's visible across all sub-tenants — exactly the cross-tenant leakage the product markets against ("strict no cross-tenant data aggregation policy"). This is the worst class of contradiction: the bug it creates is silent and isolation-breaking.
The fix: Either (a) make Knowledge match Memories (sub_tenant_id omitted → default sub-tenant) and update the page, or (b) make this asymmetry explicit with a callout on both pages: "⚠ Memories and Knowledge differ on the meaning of omitted sub_tenant_id." Don't leave the same field with two opposite defaults across primitives.
5. tenant_metadata_schema has different types in TypeScript and Python (significant)
Location: /api-reference/endpoint/create-tenant
Problem: The parameter table reads: "tenant_metadata_schema — Type: array (TypeScript) / object (Python) — No — Defines tenant-level metadata fields."
Consequence: The same logical field has two structurally different shapes depending on which SDK you use. There's no link to the two distinct schemas, no example showing the difference, and no explanation of why. Cross-language teams sharing config can't share this. An OpenAPI-driven client generator will emit one shape; the other SDK won't accept it.
The fix: Document both shapes with side-by-side examples, link to the schema for each, and explain whether the array-of-objects (TS) and object-keyed-by-name (Python) are isomorphic. If they aren't, fix the SDKs to match. The OpenAPI spec can only describe one — say which one.
6. Rate limits are an email address, not a document (significant)
Location: /api-reference/index
Problem: "Rate limits apply per API key. For production deployments, build retry logic with exponential backoff against the 429 response. Contact founders@hydradb.com for current limit values."
Consequence: No numbers. Not per-endpoint, not per-plan, not even an order of magnitude. Developers can't size workers, can't write meaningful backoff strategies, and can't decide whether to ship without a private email exchange. Agents writing client code will guess — usually wrong. For a product positioned on sub-200ms latency, the absence of throughput numbers is conspicuous.
The fix: Publish per-plan numerical limits for each endpoint group (ingestion, recall, list/fetch), the Retry-After header behavior on 429s, and any burst allowances. If limits are negotiated per-customer, publish the default tier values and say "Enterprise customers can request higher limits."
7. llm.txt is at a non-standard path and is the wrong filename (significant)
Location: /get-started/introduction
Problem: The intro says: "For developers using LLMs or IDEs, we have compiled an uploadable summary of the docs here llm.txt (/assets/llm.txt)." The emerging convention is llms.txt and llms-full.txt served from the site root (i.e., docs.hydradb.com/llms.txt). HydraDB's file is named llm.txt (singular) and lives under /assets/.
Consequence: Agents and crawlers that look for the conventional /llms.txt and /llms-full.txt (Cursor, Claude, future IDE integrations) won't find it. This nullifies the indexing benefit you were trying to provide.
The fix: Publish both llms.txt (index) and llms-full.txt (full corpus) at the docs root. Keep /assets/llm.txt as a redirect if you must, but the canonical locations should be /llms.txt and /llms-full.txt.
8. success and completed are both valid processing statuses with the same meaning (significant)
Location: /api-reference/endpoint/verify-processing
Problem: The status enum table lists six values including: "completed — Fully indexed and graphed. Ready for all recall modes." and "success — Alias for completed. May appear in some legacy responses."
Consequence: A developer or agent writing if status == "completed" will silently miss the legacy success responses and report items as still-processing when they're actually done. Both branches need to be hardcoded, but nothing in the OpenAPI enum signals that.
The fix: Either retire the success value server-side (recommended — it's described as legacy), or list it as a first-class enum member in OpenAPI and tell developers to treat {"success","completed"} as a single set in their checks. Don't half-deprecate it in prose only.
9. verify-processing query parameters are all optional with nonsensical defaults (significant)
Location: /api-reference/endpoint/verify-processing
Problem: The page states: "Per api-reference/openapi.json, all query parameters below are optional in the spec (each has a default)." The defaults shown: file_ids defaults to [], tenant_id defaults to "". There's no description of what happens when you call this endpoint with no tenant_id and no file_ids — does it return nothing, error, or query the universe?
Consequence: Coding agents that generate from OpenAPI will treat the call as parameterless and call it with empty defaults. The actual behavior is undocumented. tenant_id="" as a "valid" default is almost certainly a server-side accident — empty-string tenants shouldn't exist given the strong multi-tenancy claims elsewhere.
The fix: Mark tenant_id as required (or document the cross-tenant fallback explicitly), document the empty-file_ids behavior (return all in-flight items? error? a 400?), and remove the deprecated file_id from the public reference now that file_ids supersedes it.
10. client.passthrough.fetch() collides conceptually with client.fetch.* (minor)
Location: /api-reference/sdks
Problem: The SDK overview groups /list/* and /fetch/* under client.fetch, then lists an "additional client" client.passthrough.fetch() for POST /passthrough/fetch. So there are two unrelated .fetch entry points on the client. The doc never explains what "passthrough" is or when to use it instead of client.fetch.*.
Consequence: Autocomplete and agent code generation will pick whichever name appeared first in training data. There's no disambiguating description for passthrough anywhere in the scraped pages.
The fix: Add a one-sentence description of passthrough (what it proxies, when it's needed) and either rename one of the two fetch entry points or add a "When to use" note next to each. A client.passthrough.fetch that does something completely different from client.fetch.* is a footgun.
What they do well
- The retrieval ranking model is documented with real specifics:
alphablending,metadata_filterssemantics, the explicit "no$gte/$in/$regex" disclaimer, and the OpenAPI cross-reference onRecallSearchRequest.metadata_filters. - Asynchronous tenant provisioning is honestly surfaced — the
acceptedstatus, the polling pattern, andvectorstore_status[0]vs[1]for Memories vs Knowledge are spelled out. - Webhooks are documented with the right headers (
X-HydraDB-Signature,X-HydraDB-Delivery-ID,X-HydraDB-Event) at the level an integrator needs.
Top 3 recommendations
- Fix the four-page contradiction on
additional_metadatafilterability and the three-page split onsub_tenant_iddefaults. These are the two contradictions most likely to cause silent isolation/filter bugs in production. - Publish real rate-limit numbers and a non-empty
Authorization: Bearer <key>example. Both of these are blocking for anyone shipping without a sales call. - Move
llms.txtandllms-full.txtto the docs root with conventional names, and reconcile the two provisioning-time numbers on the infra-status page so agents and humans see one SLA.