Tensorlake Documentation Audit
The docs cover an ambitious surface (MicroVM sandboxes, orchestration SDK, document AI) but suffer from internal contradictions, a fragmented API versioning story, two parallel CLIs that no page reconciles, an API Reference root that omits the basics, and a security-sensitive default (a well-known VNC password) sitting alongside HIPAA / SOC 2 Type II claims.
1. Two CLIs ship from the same installer, no doc explains the difference (critical)
Location: /sandboxes/quickstart and the GitHub README (github.com/tensorlakeai/tensorlake)
Problem: The install script "installs the tl and tensorlake CLIs", but every example on the quickstart uses only tl (tl sbx create ...), while the GitHub README uses the other one (tensorlake login, tensorlake sbx ...). No page on docs.tensorlake.ai compares the two, documents which one is canonical, or notes whether they have feature parity. The GitHub README and the docs site also diverge on the SDK surface itself: the README uses SandboxClient.for_cloud(api_key=...), the docs use Sandbox.create().
Consequence: Developers who arrive via GitHub and developers who arrive via docs see different commands for the same operations. Agents trying to construct workflows have no way to choose between them and will mix invocations, leading to subtle failures and unreproducible behavior.
The fix: Pick a canonical CLI and SDK surface, document it, and either deprecate the other or add a single "CLI: tl vs tensorlake" page that explains the relationship, parity, and migration path. Update the GitHub README to match.
2. VNC password ships as "tensorlake" in cleartext, while docs claim HIPAA + SOC 2 (critical)
Location: /sandboxes/computer-use; HIPAA / SOC 2 Type II claim on /sandboxes/introduction
Problem: The managed desktop image's VNC password is the literal string tensorlake, and the example connect call hard-codes it: sandbox.connect_desktop(password="tensorlake"). There is no mention of rotating it, no warning about exposure when combined with public ingress / tunnels, and no documented mechanism for setting a custom password. Meanwhile the Sandboxes intro explicitly markets "HIPAA and SOC 2 Type II certification" and tunnels (a first-class documented feature) make exposing the desktop port trivial.
Consequence: Anyone who reads the docs can attempt to connect to a tunneled sandbox desktop with a credential equal to the company name. The contrast with the certification claims is severe — a well-known default credential undermines the entire compliance posture for the desktop/computer-use path.
The fix: Generate a per-sandbox VNC secret by default, document an explicit override parameter, and add a security callout next to the Computer Use example warning against public-ingress exposure without rotating the credential.
3. API Reference root omits auth, base URLs, rate limits, and pagination (critical)
Location: /api-reference/v2/introduction
Problem: The page that should serve as the front door to the REST API instead tells readers to "fetch the complete documentation index at https://docs.tensorlake.ai/llms.txt." It does not document the base URL, the Authorization header format, content-type expectations, rate limits, idempotency keys, or pagination conventions — the standard contents of an API reference root. The Examples index page (/examples/overview) follows the same anti-pattern, pointing back to llms.txt rather than orienting the reader.
Consequence: A developer (or agent) opening the API Reference cannot make a single curl call without bouncing through other pages. Common operational questions ("what's the rate limit?", "how do I paginate GET /sandboxes?", "is there an idempotency key on POST /sandboxes?") are not answered anywhere obvious.
The fix: Replace the page with a real overview: base URL, auth scheme with a working curl example, error envelope, rate limits with headers, pagination model, idempotency, and SDK-vs-REST guidance. Move the llms.txt link to a sidebar note. Do the same for /examples/overview.
4. Three different URL conventions in one API (significant)
Location: /api-reference/openapi.yaml — sandbox runtime APIs under /api/v1/, cloud document APIs under /documents/v2/, sandbox cloud APIs unversioned at /sandboxes
Problem: Sandbox runtime endpoints (/api/v1/processes, /api/v1/pty, /api/v1/files) are versioned v1, document APIs are versioned v2 (/documents/v2/parse), and sandbox cloud APIs are unversioned (POST /sandboxes). The docs site brands the whole surface "v2" via /api-reference/v2/..., but the spec itself reports version: 0.1.0.
Consequence: Developers cannot predict the URL structure for a new endpoint, and the "v2" branding in the docs URL does not match the v1/unversioned/no-version reality of the API. Forward compatibility expectations (what changes when v3 lands?) are undefined.
The fix: Document the versioning policy explicitly on the API Reference root: which surface ships at which version, when versions advance, and what the unversioned /sandboxes route guarantees. Bump the OpenAPI info.version to reflect the GA status of the product.
5. "All capabilities mirror across both SDKs" is contradicted by the Tunnels page (significant)
Location: /sandboxes/sdk-reference vs /sandboxes/tunnels
Problem: The SDK Reference states "All capabilities mirror across both SDKs." But the Tunnels page tells Python users to "wrap the CLI in a subprocess call" — there is no native Python SDK for tunnels. The "parity" claim is false for at least one first-class primitive. Cloning is similar: the Snapshots page notes cloning is "CLI only," so neither SDK exposes it.
Consequence: Developers picking an SDK based on the parity promise discover the gap only after building. Agents that pattern-match on the parity statement will generate Python tunnels or clone code that doesn't exist.
The fix: Replace the blanket parity claim with an SDK parity matrix listing every capability and showing the actual support (Native / CLI wrapper / HTTP API only) per language, including tunnels and clone.
6. SDK Reference (Orchestration) omits autoscaling parameters that other pages depend on (significant)
Location: /applications/concepts (the canonical "SDK Reference (Orchestration)" page) vs /applications/scaling-agents
Problem: The Function Configuration table documents cpu, memory, ephemeral_disk, timeout, gpu, and lists "Additional options: description, image, secrets, retries, region" — but does not document max_containers or warm_containers, which the Scaling Agents page describes as the two main parameters controlling autoscaling behavior.
Consequence: A developer reading the SDK Reference believes they have a complete decorator API and ships code missing autoscaling controls. Agents producing function definitions will omit the most cost-relevant parameters.
The fix: Add max_containers and warm_containers to the Function Configuration table on /applications/concepts with ranges, defaults, billing implications, and a link to the Scaling Agents deep-dive.
7. No changelog or release-notes page anywhere, yet docs cite minimum SDK versions (significant)
Location: /document-ingestion/parsing/barcode (and presumably others) — "SDK version 0.2.91 or later"
Problem: The barcode page requires tensorlake ≥ 0.2.91. Other pages reference SDK behavior that has clearly evolved. But the docs site has no changelog, no release notes, and no version index — a site:docs.tensorlake.ai changelog search returns nothing.
Consequence: Developers told "you need 0.2.91" cannot see what 0.2.91 added, what version they're on, or whether the upgrade introduces breaking changes. Agents asked to suggest the right version have nothing to ground against.
The fix: Publish a /changelog (or /releases) page tied to PyPI/npm releases, with dated entries linking to commits or PRs. At minimum, add a "Minimum SDK version" badge to every feature page that has a floor.
8. Disk-size example values mix units across CLI, SDK, and prose (significant)
Location: /sandboxes/quickstart vs /sandboxes/lifecycle
Problem: The quickstart "Configure CPU, Memory, Disk" examples use --disk_mb 51200 on the CLI and disk_mb=12000 / diskMb: 12000 in the Python and TypeScript snippets. The Lifecycle reference notes that the CLI accepts --disk_mb in MiB and the parameter default is 10240 (10 GiB). The same page describes the default as "10 GB disk" (decimal) while the parameter is defined in MiB. Units are not annotated in the SDK examples, and three different example values (51200, 12000, 10240) appear for the same parameter across pages.
Consequence: A developer copy-pasting from the quickstart can't tell whether 12000 is MB or MiB, and the inconsistent example values make it impossible to infer the "intended" disk size. Agents ingesting these examples will emit numbers that silently differ from the prose by ~5%, with cost and quota implications at scale.
The fix: Standardize all examples on a single value (e.g. the documented default 10240 MiB), annotate every parameter with its unit inline, and pick MiB or MB consistently across CLI, SDK, and prose. Add a "Units" callout to the Lifecycle reference.
9. Dead navigation paths and a URL-breaking filename in llms.txt (significant)
Location: /llms.txt
Problem: The llms.txt lists "Orchestration Quickstart" and "Orchestration + Sandboxes" but their canonical paths sit under /applications/, mixing the navigation label ("Orchestration") with the URL slug (/applications/). Worse, one entry uses a literal & inside the URL: sandboxes/agentic-d&g.md. An unencoded & is a query-string delimiter; naive link parsers, sharing tools, and crawlers will split the URL at the ampersand and break it.
Consequence: Agents that index docs via llms.txt — exactly the audience this file exists for — will fail to fetch the D&D example and will be confused by the Orchestration / Applications naming split.
The fix: Rename the file to agentic-dnd.md (or URL-encode as agentic-d%26g.md) and unify the public nav label with the path: either rename /applications/ to /orchestration/ or relabel the nav entries.
10. Document Ingestion product is bolted onto Sandboxes docs with no orientation (significant)
Location: /document-ingestion/overview and the global docs navigation
Problem: Document Ingestion is effectively a separate product line — distinct base path (/documents/v2/), distinct SDK class (DocumentAI), distinct concepts (datasets, parse jobs, classes), distinct features (barcodes, signatures, chart extraction). It is presented under the same docs nav as Sandboxes/Orchestration with no top-level page explaining which feature belongs to which surface, or whether they share auth, billing, and regions.
Consequence: A developer landing on the docs cannot tell whether parse is a function callable from an @function-decorated Application or a separate REST API. Agents pattern-match across the two and produce wrong call sites.
The fix: Add a "Products" landing page at the docs root that explains the three product surfaces (Sandboxes, Applications/Orchestration, Document AI), what each does, and how they compose. Tag every page with the product it belongs to.
11. Schema validation rules for structured extraction documented in the wrong place (significant)
Location: /document-ingestion/parsing/structured-extraction vs /api-reference/v2/parse/parse
Problem: The structured extraction page tells you to "provide JSON Schema objects in structured_extraction_options" but the actual constraints — "Maximum 5 levels deep, All fields required, Root fields must be objects" — live on the Parse API page, not the structured extraction guide.
Consequence: Developers writing a schema based on the guide will hit cryptic validation errors at runtime because the rules are on a page they didn't open. Agents will generate nested schemas beyond 5 levels and silently fail.
The fix: Move (or duplicate with cross-links) the schema constraint list to the structured-extraction guide where developers will first see it, and add a runnable example that demonstrates a schema at exactly the depth limit.
12. Naming inconsistencies for the same parameter across surfaces (significant)
Location: /sandboxes/harbor (storage_mb) vs /sandboxes/lifecycle (disk_mb); /applications/guides/logging (nextToken) vs the rest of the REST surface (request_id, file_id, parse_id)
Problem: Harbor's task.toml calls disk size storage_mb, while the API and SDK call it disk_mb. Logging APIs use camelCase nextToken while the rest of the REST surface uses snake_case. There is no documented translation layer.
Consequence: Copy-pasting between Harbor task definitions and SDK calls fails. Agents that infer the parameter name from one page emit the wrong key in another.
The fix: Align Harbor's task.toml keys to the platform names (disk_mb), or document the mapping table explicitly. Normalize REST parameter casing across surfaces.
13. Troubleshooting page is a paragraph; no error code reference exists (significant)
Location: /applications/production/troubleshooting
Problem: The only page titled "Troubleshooting" covers four scenarios in one paragraph (function timeouts, failed requests, memory). There is no error-code reference, no decision tree, no list of common failure modes by category, and no SDK exception inventory. The page suggests "split large datasets into smaller processing batches" but no batch helper appears in the SDK Reference at /applications/concepts.
Consequence: Production users hitting an unrecognized error code have nowhere to look it up. Agents asked to handle Tensorlake errors have no taxonomy to switch on.
The fix: Add an /errors reference with every error code, what triggers it, retriability, and the recommended remediation. Expand the troubleshooting page into per-scenario runbooks, and either document the batch helper alluded to or stop recommending it.
14. "5 million sandboxes per project" quota appears in GitHub README but nowhere on the docs site (significant)
Location: github.com/tensorlakeai/tensorlake README vs docs.tensorlake.ai
Problem: The GitHub README states "Supports up to 5 million sandboxes per project" — a concrete capacity quota that does not appear anywhere on docs.tensorlake.ai. Other quotas (1 GB file upload limit, 7-day log retention default) live only on their own feature pages, with no central limits page.
Consequence: Enterprise evaluators sizing a deployment have to dig through GitHub READMEs and individual feature pages to assemble a capacity model. Agents asked to summarize platform limits will miss the quota entirely.
The fix: Publish a /platform/limits page that consolidates every documented quota (sandbox count, file size, log retention, parse depth, disk range, etc.) with the source of each. Link it from the API Reference root.
15. Indexify relationship is undocumented despite env vars appearing in on-prem instructions (significant)
Location: /document-ingestion/parsing/on-prem
Problem: The on-prem deployment instructions reference env vars named INDEXIFY_URL and INDEXIFY_SERVER_HOST with no explanation. Indexify is the same company's earlier open-source project, but no docs page explains the relationship between Tensorlake (the product) and Indexify (the OSS engine apparently powering on-prem).
Consequence: On-prem evaluators see config keys for a product that isn't named anywhere else in the docs, and have to do outside research to understand what they're deploying. Procurement and security review stall on unidentified third-party software.
The fix: Add a short "Tensorlake and Indexify" section to the on-prem page (or platform overview) explaining the lineage, what's open source, what's commercial, and why these env vars exist.
16. Log retention has no self-serve control; 7-day default lives only on the logging page (significant)
Location: /applications/guides/logging
Problem: Logs retain for 7 days by default, "extendable to 30 days or 1 year upon contacting support@tensorlake.ai." There is no self-serve retention setting, no UI, and no documented API. The constraint is buried on the logging guide rather than surfaced as a compliance/operational limit.
Consequence: Teams with audit or incident-investigation needs (the kind of teams attracted by the HIPAA / SOC 2 claims) discover the 7-day default mid-incident or mid-audit. Email-driven retention changes do not survive a procurement review.
The fix: Surface the retention default in the platform limits page (see Finding 14) and either ship a self-serve retention control or document the support-ticket flow with an SLA on the same page.
17. "Up to 5 TB file sizes" claim in SDK reference is uncorroborated by the OpenAPI spec (significant)
Location: /applications/concepts vs /document-ingestion/file-management/overview and /api-reference/openapi.yaml
Problem: The SDK Reference states "Files bypass JSON serialization, allowing up to 5 TB file sizes." Meanwhile the document file-management overview states the upload API "is not intended for files larger than 1 GB," with anything larger requiring a support email. The OpenAPI spec does not corroborate a 5 TB limit anywhere. The two limits are 5,000× apart and refer to overlapping concepts (File handling and file upload).
Consequence: Developers planning a video / large-asset pipeline trust the 5 TB number and hit the 1 GB upload ceiling. Agents generating example code with multi-GB files will produce snippets that fail at runtime.
The fix: Reconcile the two pages: state explicitly what the in-platform File size limit is (and where it applies — function I/O? snapshots? document parse?), versus the document upload API's 1 GB ceiling. Cite the source of any "5 TB" figure.
18. Operational gotchas buried in body copy, not in runbooks (minor)
Location: /applications/secrets (rotated secrets require redeploy); /sandboxes/images (PEP 668 / --break-system-packages mandatory); /applications/futures (Futures cannot be wrapped in containers); /applications/async-functions (sync calls from async block the event loop); /document-ingestion/parsing/signature (reading_order: -1 magic value)
Problem: Each of these is a non-obvious rule that can silently break production: a rotated secret that doesn't take effect because the app wasn't redeployed; pip installs that fail mysteriously without --break-system-packages; a Future wrapped in a list that's silently treated as data; a sync call in an async function that stalls the worker; a magic reading_order: -1 value on signature fragments with no callout for how callers should handle it. Each lives as a single sentence inside a longer page.
Consequence: Developers and agents miss these rules on first read, ship code that "works" in dev, and hit hard-to-diagnose failures in production.
The fix: Surface these as a "Gotchas / Important Restrictions" callout on each relevant page, and aggregate them into a single "Production Pitfalls" runbook linked from the Troubleshooting page.
19. Performance claims appear without benchmarks (minor)
Location: /applications/architecture and /sandboxes/introduction
Problem: The architecture page asserts "hundreds of container creations per second at peak load" and the sandboxes intro claims tensorlake/ubuntu-minimal "Boots in ~100-300ms" — neither with methodology, hardware, sample size, or any reference benchmark.
Consequence: Enterprise readers planning capacity have no way to validate the claims. The numbers read as marketing in pages otherwise positioned as technical reference.
The fix: Either link to a public benchmark with methodology, or qualify the numbers as approximate and document the conditions under which they were measured.
20. Chrome-over-CDP flag requirements pinned to "current Chrome versions" with no version pin (minor)
Location: /sandboxes/chrome-cdp
Problem: The Chrome CDP guide bakes in version-sensitive requirements like --remote-allow-origins=* and a non-default --user-data-dir, but pins them to "current Chrome versions" without ever naming a Chrome version. The LangChain integration page (/integrations/langchain) has the same problem — no version pin or compatibility matrix.
Consequence: Both pages will silently rot as Chrome and LangChain release breaking changes. Users debugging a CDP connection failure can't tell whether the docs are still accurate.
The fix: Pin both pages to specific tested versions (Chrome ≥ X, LangChain ≥ Y), and add a "last verified against" line that updates per release.
What they do well
- The llms.txt index exists and is mostly complete — a baseline many doc sites still skip.
- Sandbox lifecycle and image-compatibility pages call out specific, hard-won constraints (Dockerfile feature support, PEP 668, snapshot lock-in semantics).
- The Architecture page explains the "why not Kubernetes" decision clearly, which is unusual and useful for evaluators.
Top 3 recommendations
- Make the API Reference root real: base URL, auth, rate limits, pagination, idempotency, an end-to-end curl example. Today it just bounces readers to llms.txt — and so does /examples/overview.
- Reconcile the two CLIs, the SDK parity story, and the platform limits: pick canonical names, publish a parity matrix, unify the GitHub README with the docs site, and consolidate quotas (5M sandboxes, 1 GB files, 7-day logs) onto one limits page.
- Ship a changelog, an error-code reference, and rotate the VNC default — the three pieces that move the docs from demo-grade to production-grade, including not shipping a desktop image whose password equals the company name.