Benchspan Documentation Audit
Tight surface area (18 URLs, one API endpoint, two SDKs), but the docs already contradict themselves on the security-boolean contract, the SDK silently fills in a required field, and one integration example calls a role that the upstream API doesn't accept.
1. Injection-threshold contradiction between API reference and concept page (critical)
Location: /api-reference/scan vs /concepts/how-it-works
Problem: Two pages give two different cutoffs for the same field. The scan reference says injection is "true if the score crosses our injection threshold (score ≥ 0.5)". The how-it-works page says score (0–1): confidence level; >0.5 triggers detection. At a score of exactly 0.5, one page returns injection: true and the other returns benign.
Consequence: A developer tuning a custom warn/block policy on top of score will pick a boundary based on whichever doc they read first, and their behavior at the threshold will silently disagree with Benchspan's verdict field. For a security product whose entire job is a thresholded boolean, this is a contract bug, not a wording nit. Agents reading both pages get conflicting truth and can't reconcile.
The fix: Pick one (≥ 0.5 matches the example response "score":0.9999,"injection":true and is the standard convention). Replace >0.5 on the how-it-works page, and add a one-line sentence to both pages that says "the threshold is closed at 0.5".
2. Long-input handling contradicts itself: truncated vs 413 rejected (critical)
Location: /api-reference/scan vs /api-reference/errors
Problem: The scan reference says input has a "max 32,000 characters; longer inputs truncated." The errors page says 413 Payload Too Large: Input exceeds 32,000 character limit per request. Split content into chunks or truncate client-side. These are mutually exclusive behaviors — the API either silently truncates a 40,000-char input or rejects it with 413. It cannot do both.
Consequence: A developer scanning a 50KB Gmail body or Drive doc has no way to know whether the latter half of their content was scanned, dropped, or never reached the classifier. For an injection firewall, "silently dropped" and "rejected with 413" have very different threat-model implications: with truncation, a poisoned payload past byte 32k slips through unscanned. With 413, it fails closed. Pick one.
The fix: State the actual behavior on both pages, character-identically. If the API truncates, remove the 413 entry and document the truncation point. If it 413s, remove "longer inputs truncated" from the scan page. Whichever you pick, also document the byte/char counting rule (UTF-8 bytes vs Unicode code points vs JS string length).
3. Python SDK silently defaults role="user" for an API-required field (critical)
Location: /sdks/python vs /api-reference/scan and /concepts/roles
Problem: The scan endpoint documents role as required: role ("user" | "tool", required): Origin of text; tool-origin content follows dedicated classifier path. The Python SDK signature is scan(input, role="user", source=None) — role has a silent default. A developer who scans tool output but forgets the kwarg (guard.scan(email_body)) will route the call through the user-origin classifier path. The roles page explicitly says "the classifier treats tool-origin content as the dominant attack vector, trained specifically on injection patterns from scraped pages, emails, and documents."
Consequence: Tool output gets classified by the weaker, less specialized path with no error, no warning, and no dashboard signal. This is a silent misclassification on a security product — the exact failure mode the SDKs claim to prevent with "fail closed" defaults. Worse, it's invisible: scans still succeed, scores still come back, the developer never learns the wrong classifier ran.
The fix: Either make role required in the Python signature (raise TypeError on missing kwarg, matching the API contract), or add a prominent warning on /sdks/python and /concepts/roles that the default is user and call out the misclassification risk for tool output. The TypeScript SDK takes role inside ScanOptions with no documented default and should be clarified the same way.
4. API-key example length doesn't match the documented key length (significant)
Location: /api-reference/authentication
Problem: The page states: "Total length: 40 characters, Random component: 32 hexadecimal characters." The worked example immediately below is ag_live_1a2b3c4d5e6f7890abcdef1234567890ab. The prefix ag_live_ is 8 characters; the random component (1a2b3c4d5e6f7890abcdef1234567890ab) is 34 hex characters, making the total 42. The docs and the example disagree by two characters.
Consequence: Any developer writing key-format validation (regex, length check, secret scanner) by reading this page will reject real production keys, or accept keys that the server rejects, depending on which figure they trust. CI secret-scanners and pre-commit hooks built from the doc string will misfire. For a Bearer-token product, the canonical key shape is part of the wire contract.
The fix: Decide whether keys are 40 chars (32 hex random) or 42 chars (34 hex random) and fix the side that's wrong. Add a one-line regex (^ag_live_[0-9a-f]{32}$ or the right variant) to the page so machines can validate without prose-parsing.
5. Anthropic integration tells users to send a role that doesn't exist in the Anthropic API (significant)
Location: /integrations/anthropic
Problem: The Anthropic integration page instructs: "When Claude uses tools, outputs flow back as tool_result blocks. Include these in subsequent message arrays with role: 'tool' so Benchspan scans them." Anthropic's Messages API does not have a tool role — tool_result content blocks are nested inside user-role messages. A literal reading of this guidance produces a 400 invalid_request_error from Anthropic before Benchspan ever sees the payload.
Consequence: A developer who copies this pattern hits an upstream 400 and assumes Benchspan's middleware is broken, or worse, refactors their working Anthropic integration to "match the docs" and breaks it. For an integration page on a product that markets itself as a drop-in scanner, the upstream API shape has to be correct.
The fix: Rewrite the tool-handling sentence to match Anthropic's actual content-block shape: tool outputs return as tool_result blocks inside user messages, and Benchspan classifies the block content as role: "tool" internally (separate from the Anthropic message role). Show one correct end-to-end Anthropic + Benchspan example so the distinction is unambiguous.
6. "How it works" nav link 404s (significant)
Location: Docs landing page nav → /how-it-works
Problem: The landing page surfaces "How it works" as a top-level navigation section, but /how-it-works returns 404. The real page lives at /concepts/how-it-works. Similarly, /api-reference returns 404 (real entry is /api-reference/overview), and /introduction, /pricing, /changelog are all 404.
Consequence: Anyone typing the obvious URL, following a stale external link, or pasting a guessed path lands on 404s. Crawlers and search engines following the rendered nav will see broken canonical paths. For agents, missing index pages mean /api-reference can't be used as a directory link in summaries — they have to know /api-reference/overview is the entry point.
The fix: Either alias /how-it-works → /concepts/how-it-works (and /api-reference → /api-reference/overview) with 301s, or restructure the URL tree so the nav and the URL match. Pick one, not both.
7. Self-hosted deployment named in the Python SDK, present-but-unexplained in TS, no page either way (significant)
Location: /sdks/python and /sdks/typescript
Problem: The Python SDK explicitly frames its api_url constructor parameter as "Override for self-hosted deployments." The TypeScript SDK exposes the same option (apiUrl) with no documented purpose at all — it's just a constructor field. The sitemap has zero pages on self-hosted deployment: no install guide, no Docker image, no Kubernetes manifests, no licensing or eligibility note, no listing in the integrations section.
Consequence: Python users are told a feature exists with no way to actually use it. TypeScript users see a knob with no explanation of what it's for. Both outcomes generate the same support tickets, and enterprise buyers reading the Python page will ask sales about a product surface that has no documented existence.
The fix: If self-hosting is real, add /deployment/self-hosted with image coordinates, license terms, telemetry/key-management posture, and how to point either SDK at it. If it's only for staging/proxy use, remove the "self-hosted" wording from the Python page and add a one-line scope statement to both SDK pages explaining what apiUrl/api_url is actually for (regional endpoint, proxy, test stub).
8. Latency numbers in docs are ~7× the numbers on the marketing site (significant)
Location: /concepts/how-it-works, /api-reference/overview, /concepts/modes vs benchspan.com
Problem: Docs repeatedly say "sub-100 ms scan latency for typical tool outputs" and "typical latency is under 100 ms for inputs up to ~2,000 tokens." The marketing homepage says "average latency of 14ms with P99 at 42ms." Both are framed as observed performance, not theoretical ceilings.
Consequence: A developer evaluating Benchspan for a latency-sensitive path (voice agents, real-time chat — explicitly called out on /concepts/modes) needs to know whether their per-turn budget is 14ms or 100ms. The 7× gap changes whether block mode is viable inline.
The fix: Decide which figure is the ceiling and which is the typical, and use both consistently. Suggested: docs surface P50 ~14ms, P99 ~42ms, ceiling <100ms for ≤2,000 tokens with one canonical sentence reused on the three pages that currently say "sub-100ms."
9. SDK parity gap: Python scan throws, TypeScript scan doesn't (significant)
Location: /sdks/python vs /sdks/typescript
Problem: The TypeScript SDK exposes two clearly distinct methods: scan(input, options?) "evaluates text without throwing" and scanOrThrow(input, options?) "throws InjectionDetectedError when verdict is block." The Python SDK lists only scan(input, role, source) plus the note "InjectionDetectedError: Raised on block." There is no scan_or_throw and no documented way in Python to get a non-throwing scan when mode="block" is set on the constructor.
Consequence: Developers writing cross-language services will reasonably assume guard.scan(...) behaves the same way in both SDKs and write Python try/except where they wrote a TS conditional, or vice versa. Worse: a developer who wants a non-throwing "just give me the score" call in Python has no documented method to use — they have to construct a separate BenchGuard instance in warn mode just to avoid the exception.
The fix: Either ship scan_or_throw/scan_no_throw in Python and document the symmetry, or rename the TS methods to align with whichever Python ships. Add a one-line "throwing vs non-throwing semantics across SDKs" matrix to both SDK pages.
10. Rate limits documented only as a monthly quota; no per-second/per-minute cap (significant)
Location: /api-reference/errors, /api-reference/authentication
Problem: Both pages mention 429 Too Many Requests and reference a monthly quota ("Free tier allows 50,000 scans monthly"), but neither documents a per-second, per-minute, or burst rate limit. The errors page acknowledges a Retry-After header on 429 but never says what triggers one within the monthly budget.
Consequence: An agent processing a 1k-email Gmail backlog at 200 req/s has no documented way to know whether to throttle, what concurrency is safe, or whether the 50,000-scans/month math applies linearly. The first production load test discovers the limit by hitting it. Inline middleware-style integration (where every tool call hits the API) makes this especially load-sensitive.
The fix: Publish concrete numbers: requests/second per key, requests/minute per organization, concurrent-connection cap, and any per-IP throttling. Add them as a table on the errors page and a header on the auth page.
11. Mode precedence between constructor and per-request parameter is undocumented (minor)
Location: /sdks/python, /sdks/typescript, /api-reference/scan
Problem: mode can be set on the SDK constructor (BenchGuard(..., mode="block")) and also as a per-request body field on POST /v1/scan (mode ("block" | "warn", optional)). No page documents which wins when both are set, or whether the SDK forwards the constructor value as a per-request override.
Consequence: A team running a mixed deployment — block in production, warn for a specific evaluation crew — has no contract for how to compose the two settings. The first time someone sets mode="warn" per-call on a block-configured guard, the resulting behavior is an experiment, not a spec.
The fix: State the precedence rule explicitly on /api-reference/scan and both SDK pages: e.g., "per-request mode overrides constructor mode," then mirror that with an example on the modes concept page.
12. Classifier accuracy numbers live on the marketing site, not in the docs (minor)
Location: Docs (absent) vs benchspan.com homepage
Problem: The homepage advertises specific benchmark results: 99.9% catch rate on AgentDojo, 94% catch rate on InjecAgent, 0.19% false-alarm rate on production-like traffic. None of these numbers appear in /docs.benchspan.com. The model_version field returns classifier-v3, which implies v1 and v2 existed and were measured differently — but there's no per-version benchmark table.
Consequence: A security engineer evaluating Benchspan for compliance or red-team purposes has to cross-reference the marketing site to find the only quantitative claims about the product's accuracy, then has no way to know which model_version those numbers apply to. When classifier-v4 ships, customers on v3 have no documented baseline.
The fix: Add /concepts/accuracy (or a section on how-it-works) with the AgentDojo / InjecAgent / FPR numbers, pinned to a specific model_version, plus a note on how the numbers update across classifier versions.
13. No changelog despite versioned classifier in API responses (minor)
Location: /changelog (404) and /api-reference/scan
Problem: Every scan response includes model_version (e.g., classifier-v3). The sitemap has no changelog page, and /changelog returns 404. There's no record of when v2→v3 happened, what changed, or whether old API keys still target an old model.
Consequence: A customer who saw a score change for the same input between two weeks has no public reference to explain it. Auditors and red teams can't pin findings to a model revision. Agents summarizing "what's new in Benchspan" have nothing to cite.
The fix: Add /changelog with dated entries per classifier-v{N} bump, SDK release, and API behavior change. Link model_version in the scan-response docs to the corresponding changelog anchor.
14. No OpenAPI/machine-readable spec for the REST API (minor)
Location: /api-reference/* (and absent from /sitemap.xml)
Problem: The REST API has one endpoint (POST /v1/scan) with five request fields, six response fields, and five status codes. None of it is exposed as OpenAPI/JSON Schema — there is no /openapi.json, no Swagger UI, no machine-readable schema file anywhere in the sitemap. All parameter types, enums (role, mode, verdict), and constraints (32k char cap, score range, classifier version format) live only in prose.
Consequence: Coding agents (Cursor, Claude Code, Copilot) can't programmatically validate a generated request body. Codegen tools (openapi-generator, oazapfts, etc.) can't produce a typed client. Postman/Insomnia users have to hand-rebuild the collection. Less severe than a contradiction, but a missed nicety for an agent-first product.
The fix: Publish an OpenAPI 3.1 spec at /openapi.json (or /api-reference/openapi.json) covering the scan endpoint, both role enums, both mode enums, all three verdict values, and the 400/401/413/429/5xx error envelope. Link it from /llms-full.txt so agents discover it.
What they do well
- Both
llms.txtandllms-full.txtare present and serve real, non-trivial content — agents can index without scraping the rendered nav. - Roles model (
user,tool, plus the explicit trust boundary excludingsystemandassistant) is clearly stated in/concepts/rolesand consistent across SDK and integration pages. - Failure-mode posture is named explicitly ("SDKs default to failing closed") rather than left as folklore, which is rare for security middleware.
Top 3 recommendations
- Fix the three correctness contradictions before anything else: threshold
≥0.5vs>0.5, long-input truncation vs 413, and the 40-vs-42-character API key example. These are wire-contract bugs, not polish. - Either make Python's
rolerequired (matching the API contract) or loudly document the"user"default — a silent misclassification path on a security product is the worst kind of footgun. - Rewrite the Anthropic integration's tool-handling guidance to match Anthropic's actual
tool_result-inside-usershape; a copy-paste from this page currently produces an upstream 400.