Archal Documentation Audit
One-line state: well-organized Mintlify docs with a clear mental model, but riddled with cross-page contradictions on the things that actually break (or alter) a run — auth headers, the autonomous loop's default behavior, supported-service lists, harness output format, and command names — most of which fail silently for both humans and the AI agents these docs are explicitly written for.
2. archal autoloop defaults to the most destructive execution policy — it opens GitHub issues/PRs by default, while the preprod loop gates that behind flags (critical)
Location: /cli/autoloop vs /cli/preprod
Problem: The autoloop reference documents --execution-policy observe|grade|reproduce|fix with Default: fix, and defines fix as "full loop, including GitHub issue or PR creation." The preprod loop, by contrast, only "uses the managed preprod remediation path when fixes are allowed" (/cli/preprod) — PR creation is gated. So the two autonomous loops ship opposite safety defaults: preprod is conservative, autoloop is fully autonomous out of the box.
Consequence: A developer who runs archal autoloop to import and inspect production traces — reasonably expecting to look before acting — instead triggers the full reproduce-and-remediate loop, which opens GitHub issues or pull requests against their repository by default. There is no documented warning at the point of invocation, and an agent reading the reference has no signal that the default is the aggressive one. The inconsistency between the two loops' defaults makes the safe assumption (preprod-style gating) wrong for autoloop.
The fix: Either change the default to observe (or grade), or add a prominent warning on /cli/autoloop and the Autonomous loops guide that the default policy opens issues/PRs, and explicitly contrast it with preprod's gated behavior so the two loops' safety models are documented side by side.
3. Four pages list four different sets of services that route mode supports (critical)
Location: /clones/overview, /guides/vitest, /guides/route-mode-safety, /guides/sandbox
Problem: The set of services whose traffic actually gets intercepted differs on every page that names it:
- Clones overview —
archal run/archal/vitestroute mode supports 12: "Discord, GitHub, Google Workspace, Jira, Linear, Ramp's primary API domain, Slack, Stripe, Supabase, Apify, Tavily, and Datadog." (The overview is internally consistent — its route-mode list and its inlinearchal/vitestline both name the same 12.) - Vitest page — 9: "Discord, GitHub, Google Workspace, Jira, Linear, Ramp's primary API domain, Slack, Stripe, and Supabase." (no Apify/Tavily/Datadog)
- Route-mode safety — a 9-row table: GitHub, Discord, Slack, Stripe, Jira, Linear, Supabase, Google Workspace, Ramp.
- Sandbox interception table — 11 rows: adds Apify and Tavily, but omits Datadog.
Consequence: Route mode is the trust boundary — whether a request hits a clone or the live API. If a developer trusts the Clones overview and writes a Vitest test against Datadog or Apify expecting interception, but the actual Vitest support set is the 9-item list, their test silently calls the real Datadog/Apify API with test tokens (or fails auth against production). For agents there's no judgment call: they parse one page and get a wrong answer with no signal.
The fix: Maintain one canonical supported-service matrix (ideally generated from the same source the CLI uses) and have every page transclude or link to it instead of re-listing. At minimum, reconcile the four lists today.
4. Harness output contract contradicts the scaffold it ships (critical)
Location: /guides/docker-harness-contract vs /quickstart
Problem: The Docker harness contract is explicit: stdout is "Captured as the agent response text. Write your final answer or summary here," and "Your harness should write its final answer to stdout exactly once." Its minimal example writes plain text: process.stdout.write('Agent completed the task.\n');. But the harness that archal init generates and Quickstart documents writes a JSON envelope: console.log(JSON.stringify({ text }));.
Consequence: Following the contract page (plain text) and following the scaffold (JSON envelope) produce different stdout shapes for the same captured field. Either evaluation reads {"text":"..."} literally as the agent's answer (scoring against a JSON blob), or the contract is right and the scaffold double-wraps. A developer can't tell which behavior is correct, and an agent generating a harness has two authoritative-looking templates that disagree.
The fix: Decide whether stdout is raw text or a JSON envelope, then make the archal init scaffold and the contract page emit the identical pattern. If a JSON envelope is required, document the schema on the contract page; if plain text is required, fix the scaffold's console.log(JSON.stringify({ text })).
5. Node version requirement contradicts itself across pages (significant)
Location: /quickstart vs /guides/sandbox
Problem: Quickstart says init "Requires Node.js 20.20 or later (the version archal declares in engines)." The Sandbox page says "Both modes require Node.js 22+." Those two directly-quoted statements conflict. (The source also notes the CI examples use node:22, consistent with the Sandbox page rather than Quickstart.)
Consequence: A developer on Node 20.x reads Quickstart, gets a working archal init, then fails the moment they try --sandbox/--docker because that path requires 22+. An agent provisioning a CI image has two different "correct" minimums to choose from, with no indication which applies to which command.
The fix: Pick one minimum (the Sandbox page and CI examples agree on 22), state it once, and make Quickstart's requirement and its engines claim match. If init truly supports an older floor than --sandbox, document the per-mode requirement explicitly instead of stating a single conflicting number.
6. @archal/runtime is the recommended package but is undocumented and uninstallable (significant)
Location: /guides/direct-api-access
Problem: This page says "Most Node.js code should use @archal/runtime; it handles this auth shape for you." That is the only mention of @archal/runtime in the entire docs set. Every install page (Quickstart, Vitest) only ever adds the archal package (pnpm add -D vitest archal). There is no install command, no reference page, and no API for @archal/runtime.
Consequence: The docs steer "most Node.js code" toward a package a developer cannot install or learn — there's nothing to npm install and no surface to call. They fall back to the raw two-header path the page frames as the exception, which is also where the broken Octokit snippet lives (Issue 1).
The fix: Either document @archal/runtime (install command, the helper it exposes, a working example replacing the raw Octokit snippet) or remove the recommendation and make the supported package explicit.
7. archal run defaults --pass-threshold to 0; preprod defaults the same flag to 80 (significant)
Location: /cli/run vs /cli/preprod
Problem: cli/run documents --pass-threshold <score> with default 0, and exit code 0 means "Run succeeded and score met --pass-threshold." cli/preprod documents the same-named flag on archal preprod start with "Default: 80."
Consequence: A developer who wires archal run into CI for gating gets a default that treats any satisfaction score (including 0) as passing — the command exits 0 regardless of quality, so the gate never fails. Someone who learned the flag from the preprod page assumes 80 and is surprised that plain archal run never blocks a regression. The same flag name with a 0-vs-80 default across sibling commands is a quiet CI footgun.
The fix: Call out the differing defaults explicitly on both pages (and the rationale), or align them. At minimum, warn on the cli/run page that the default of 0 means runs always pass unless a threshold is set.
8. The pre-prod guide and the CLI reference disagree on the primary command (significant)
Location: /guides/autonomous-loops vs /cli/preprod
Problem: The Autonomous loops guide names archal preprod run as the Pre-prod "Primary command" and never mentions preprod start. The CLI reference says the opposite: "start is the normal pre-production loop," and "Use run only when you need to run an already-resolved scenario list or pack directly."
Consequence: A developer following the guide adopts preprod run as their main loop — the command the reference explicitly says is for the narrow already-resolved case — and never discovers preprod start, which is the one that creates/reuses the 20-scenario pack and manages remediation. They reimplement setup the managed loop would have done for them.
The fix: Make the guide lead with preprod start as the primary loop (matching the reference), and demote preprod run to the niche case there too.
9. Sandbox depends on "OpenClaw," which is defined nowhere (significant)
Location: /guides/sandbox
Problem: The Sandbox page introduces a hard dependency on OpenClaw — an archal/sandbox image, an "OpenClaw CLI installed and in PATH," an "OpenClaw gateway" the entrypoint starts, and a remediation step npm install -g openclaw (the source also notes --openclaw-* flags). OpenClaw is never defined, linked, or documented on any other page. What it is, who publishes it, and where it comes from is absent.
Consequence: The --no-docker local debug mode cannot be set up from the docs — a developer hits "OpenClaw CLI was not found in PATH," runs the suggested npm install -g openclaw, and has no way to know whether that's the right package, what version, or what it does to their machine. An agent has no concept to resolve at all.
The fix: Add an OpenClaw section or external link explaining what it is and its relationship to Archal, and verify npm install -g openclaw resolves to the intended package. If it's an internal/optional tool, label the --no-docker path accordingly.
10. Sandbox injects credential env vars under different names than the Authentication page documents (significant)
Location: /guides/sandbox vs /guides/authentication
Problem: The Sandbox entrypoint "Injects fake service credentials into the environment (STRIPE_API_KEY, SLACK_TOKEN, etc.)." The Authentication page describes the override/stamping env vars as STRIPE_BOOTSTRAP_TOKEN, SLACK_BOOTSTRAP_TOKEN, etc., and says the CLI "stamps bootstrap tokens (e.g. GITHUB_TOKEN)" into the harness environment. So the same concept appears as STRIPE_API_KEY on one page and STRIPE_BOOTSTRAP_TOKEN/GITHUB_TOKEN on another.
Consequence: A developer whose SDK reads STRIPE_API_KEY can't tell whether Archal sets that, STRIPE_BOOTSTRAP_TOKEN, or both — and which env var to override to control the credential. A wrong guess means the SDK reads an unset var and sends no/garbage auth, or the override silently does nothing.
The fix: Document one authoritative table mapping each clone to the exact env var(s) Archal injects vs. the override var, and use those identical names on the Sandbox and Authentication pages.
11. Stripe clone lists the same tools as both working strict-mode tools and stubs (significant)
Location: /clones/stripe
Problem: Under "Core tools (strict mode)" the page lists search_stripe_documentation, stripe_integration_recommender, and send_stripe_mcp_feedback as available tools. But "Known limits" says "Documentation search, integration recommender, and feedback tools are stubbed," and "Notes" repeats: "stripe_integration_recommender and search_stripe_documentation return stubs (no real API calls)." A tool can't simultaneously be a functioning core strict-mode tool and a stub. (Separately, the source notes retrieve_customer is filed under "Extended tools (strict=false only)" while create_customer/list_customers are core, which reads like a mis-file.)
Consequence: A developer writes a scenario that exercises search_stripe_documentation expecting real behavior (it's in the core table), then gets stub responses with no error — wasting a run and misreading the score.
The fix: Mark the three stubbed tools explicitly as stubs in the core table (e.g. a "stub" badge), or move them out of "Core tools." Reconcile retrieve_customer's placement with the other customer tools.
12. The bootstrap-token reference covers only 5 of 25+ clones (significant)
Location: /guides/authentication
Problem: The "Bootstrap tokens by clone" table documents fixed bootstrap tokens and override env vars for exactly five clones — GitHub, Slack, Jira, Stripe, Discord. Every other clone in the catalog (20+, including Linear, Google Workspace, Ramp, Supabase, and the rest) gets only a catch-all sentence: they "accept any non-empty bearer token… a placeholder like test-token works." For auth — the most central concern when wiring an integration — the page is mostly a five-row table plus a footnote.
Consequence: A developer setting up, say, a SendGrid or Twilio scenario has no documented bootstrap token, no override env var name, and no per-clone guidance — only the generic "any non-empty token" rule, which doesn't tell them which env var the SDK actually reads or how to override it. For the five tabled clones they get exact constants; for the rest they're guessing. An agent has no structured per-clone auth contract to follow.
The fix: Extend the bootstrap-token table to every startable clone, listing the bootstrap token (or explicitly "any non-empty token") and the override env var for each, generated from the same source the proxy uses.
13. OpenAPI route shape doesn't match the clone URLs every guide documents (significant)
Location: /api-reference/openapi.json vs all clone/session guides
Problem: The spec (v0.3.0) models clone access as a path-based route: /runtime/{sessionId}/{cloneId}/api/{path} (proxyCloneGet/proxyClonePost). Every guide uses subdomain-based URLs instead: https://<session>.clones.archal.ai/<clone>/api (e.g. the Octokit baseUrl, and the GitHub clone's https://<session>.clones.archal.ai/github/mcp).
Consequence: An agent or developer generating a client from the OpenAPI spec builds requests against /runtime/{sessionId}/{cloneId}/api/... on the API host, while the docs' working examples hit a clone subdomain — two different URL shapes for the same operation. One of them is wrong, and there's no note explaining the relationship (rewrite? legacy? gateway?).
The fix: Reconcile the spec with the documented subdomain URLs — either update the spec's server/path model or add an explicit mapping explaining how clones.archal.ai/<session>/<clone> corresponds to /runtime/{sessionId}/{cloneId}/api/{path}.
14. Several commands the guides tell you to run have no reference page (significant)
Location: /guides/autonomous-loops, /cli/autoloop (referencing commands absent from /cli/*)
Problem: The guides instruct users to run npx archal usage, archal workspace api-key create <label> --scope sessions:write, archal detach, archal autoloop-status, and archal autoloop-reprocess. The source indicates none of these have a CLI reference page in the cli/* index. The workspace API key command in particular is cited on at least three pages (Vitest, Direct API access, Authentication) as the way to make CI tokens.
Consequence: A developer can't discover the flags, scopes, or output of commands the docs require them to run — notably archal workspace api-key create, which gates all CI usage. They guess at --scope values and error handling, or file a support ticket.
The fix: Add reference pages for usage, workspace api-key, detach, autoloop-status, and autoloop-reprocess, and link them from the guides that invoke them.
15. "Startable clones" omits four clones the overview lists as available (significant)
Location: /guides/clone-sessions vs /clones/overview
Problem: The Clone sessions "Startable clones" list (Apify, Cal.com, ClickUp, Customer.io, Datadog, Discord, GitHub, GitLab, Google Workspace, Jira, Linear, OwnerRez, PriceLabs, Ramp, SendGrid, Sentry, Slack, Stripe, Supabase, Tavily, Unipile, Webflow) is missing Firecrawl, HubSpot, Telegram, and Twilio, all of which appear under "Available clones" on the Clones overview (and in the REST/API fidelity list). It also includes Cal.com, ClickUp, and GitLab, which the overview files under "Previews."
Consequence: A developer who saw Firecrawl/HubSpot/Telegram/Twilio in the catalog tries archal clone start hubspot and can't tell from the docs whether it's unsupported, renamed, or an omission. The "available" vs "startable" vs "preview" distinctions don't line up, so there's no reliable way to know what clone start actually accepts.
The fix: Generate the "Startable clones" list from the same source as the catalog, or explicitly explain why available ≠ startable (e.g. previews aren't startable) and fix the missing four.
16. Telemetry is off by default, but the headline features require it (significant)
Location: /security
Problem: "Telemetry. Off by default." When on, "run traces can include tool calls, request parameters, clone responses... They power the dashboard and historical satisfaction tracking." The summary states traces upload "only when telemetry/tracing is enabled." Yet the dashboard, "historical satisfaction tracking," and both autonomous loops (which import traces, grade them, reproduce) are presented across the docs and marketing as core product capabilities.
Consequence: A developer follows Quickstart, runs scenarios, and finds an empty dashboard with no history — because the experience the marketing site leads with ("Every run is a full trace") depends on a setting that ships off and is documented only on the Security page. The dependency is left for the reader to infer.
The fix: State on the onboarding/Quickstart and autonomous-loops pages that the dashboard and trace-backed loops require enabling telemetry/tracing, with a link to how. Make the default-off/feature-on relationship explicit where the features are sold, not only on the Security page.
17. No single authoritative seed list per clone; pages disagree, and Ramp contradicts itself (significant)
Location: /guides/seeds, /clones/github, /clones/ramp
Problem: Seeds are referenced inconsistently:
- GitHub: the clone page's Notes reference
rate-limitedandpermissions-deniedseeds that do not appear in the Seeds-guide GitHub table (empty,small-project,enterprise-repo,stale-issues,large-backlog,merge-conflict,ci-cd-pipeline). The GitHub "Common seeds" row lists yet another subset (small-project,enterprise-repo,stale-issues,merge-conflict,empty— nolarge-backlog/ci-cd-pipeline). - Ramp: the "Start here"/Seeds guide list four seeds (
default,empty,ramp-receipt-mismatch,ramp-expense-split), but the Ramp page's own "Seed model" says the clone "ships with" onlydefaultandempty— a same-page contradiction.
Consequence: A developer who writes --seed ramp-receipt-mismatch based on the Start-here box can't tell from the Seed model section whether it exists; a GitHub user can't tell if rate-limited/permissions-denied are real seeds or doc leftovers. Scenarios fail at provisioning with an unknown-seed error.
The fix: Make the Seeds guide the single source of truth per clone (generated from the clone definitions), and have every "Common seeds" row and clone-page Notes reference that list rather than restating partial sets. Fix the Ramp page so "Seed model" and the seed list agree.
18. "autoloop" appears as a generic verb where it overwrote unrelated words (minor)
Location: /guides/clone-sessions, /guides/route-mode-safety, /clones/stripe
Problem: autoloop is a specific, documented command (trace import → grade → reproduce → fix, per /cli/autoloop). But it shows up as a stray verb in unrelated prose: "autoloop an existing app before writing a scored scenario" and "let the shell autoloop implicitly" (Clone sessions), "Read this before autolooping a real app" (Route-mode safety), and "Create an invoice item and autoloop it to an invoice" (Stripe). These read as a find/replace that overwrote words like "adopt"/"attach"/"add."
Consequence: A reader who knows archal autoloop (the post-prod trace loop) is misled into thinking these flows invoke it; an agent may try to map "autoloop an existing app" to the autoloop command. The Stripe "autoloop it to an invoice" is meaningless for invoice-item creation.
The fix: Restore the intended verbs ("attach," "adopt," "add"/"associate") and reserve "autoloop" for the actual command.
19. Spec-only status codes and payload limits never surface in the error or limits docs (minor)
Location: /api-reference/openapi.json vs prose/error docs
Problem: The OpenAPI spec documents 413 "Payload exceeds 512 KB", 426 "CLI upgrade required", and 428 "Device authorization is still pending". None of these appear in any prose page — there is no error-code reference, no documented payload size limit (the 512 KB ceiling exists only in the spec), and --device login on /cli/login never mentions the 428 pending state.
Consequence: A developer hits a 413 on a large request with no documented limit to design around, or a 426 telling them to upgrade with no doc explaining it, or polls device login without knowing 428 means "keep waiting." Agents have no prose contract to map these codes to recovery behavior.
The fix: Add an error/status-code reference page covering 400/401/403/404/413/426/428/429/502/503, state the 512 KB payload limit in the limits/quickstart docs, and document the 428 pending state on the device-login page.
20. The same credentials.json file is documented with two different field lists (minor)
Location: /security vs /cli/login
Problem: Security says credentials.json "stores email, plan, selected clone ids, expiry, and encrypted token fields." Login says it "stores your account identity, active workspace, workspace role, workspace plan snapshot, workspace clone catalog, expiry, and encrypted token fields." Two non-matching field lists for one file. Additionally, the macOS key fallback chain (credentials.key / ARCHAL_CREDENTIALS_MASTER_KEY) is documented only on Security, absent from Login.
Consequence: A developer auditing what's stored locally (a reasonable security question) gets contradictory answers depending on which page they read, and can't rely on either as authoritative.
The fix: Document the credentials.json schema once and reference it from both pages; move or cross-link the macOS key fallback so it appears wherever credential storage is discussed.
21. The docs and marketing site give the product different one-liners and different "first workflow" (minor)
Location: /introduction vs https://www.archal.ai/
Problem: The product is positioned differently across surfaces: the marketing <title> is "Archal | The QA layer for AI agents"; the docs meta description is "score, trace, and harden AI agents" while the intro body calls it "service-shaped clones." The marketing site also leads with the autonomous "caught, reproduced, fixed" PR loop as the headline, whereas the docs treat that loop as a secondary (autoloop/preprod) feature and lead with clones and archal run. (The source note references a third, YC-page framing, but no YC page was scraped, so that claim is not included here.)
Consequence: A developer arriving from the marketing site ("QA layer," autonomous PR loop as the hero) and one reading the docs ("clones platform," archal run first) form different mental models of what the primary workflow is, making it harder to know whether to start with archal run, the clones, or the autonomous loop.
The fix: Settle on one primary positioning and one canonical first workflow, and align the docs intro and marketing title to it (or explicitly note the relationship between "clones," "QA," and the autonomous loop).
What they do well
- The /introduction Concepts table (Clone, Scenario, Harness, Satisfaction, Seed, Trace) gives a crisp, shared vocabulary up front, and the "when to use which entrypoint" split (
archal run/archal clone start/archal/vitest) is a genuinely useful orientation. - The /guides/direct-api-access "Header semantics" section is precise about request handling order and where the clone sees which token — it's the right model, even though the code sample doesn't follow it.
- An OpenAPI spec exists and is published, and the GitHub clone page is a clean, complete reference (errors match real API format, seeds, MCP endpoint) that the thinner clone pages could be modeled on.
Top 3 recommendations
- Fix the defaults and snippets that silently change behavior — the autoloop default of
fixopens PRs without warning (Issue 2), the auth example leaks the Archal token and skips route auth (Issue 1), and the harness output format disagrees with the shipped scaffold (Issue 4). These are the changes a developer or agent won't catch until something has already happened in production. - Generate the moving lists from a single source of truth — supported route-mode services (Issue 3), startable clones (Issue 15), per-clone bootstrap tokens (Issue 12), and per-clone seeds (Issue 17) currently differ page-to-page; transclude one canonical matrix instead of restating it on every page.
- Close the reference gaps — reconcile the Node version minimum (Issue 5), document
@archal/runtimeand OpenClaw (Issues 6, 9) or stop recommending them, add reference pages for the commands the guides require (Issue 14), and ship an error/status-code page covering the spec-only codes and the 512 KB limit (Issue 19).