Ryvn Documentation Audit
Ryvn's docs are well-organized for human reading and unusually agent-aware (llms.txt, a Docs MCP, Agent Skills), but the foundations leak: a core contradiction about which domain your app is actually served on, an OpenAPI spec that files every endpoint under "other," a marketing overview that promises "no Terraform" next to a Terraform service type, and a set of copy-paste-grade hazards — a wildcard IAM policy shipped first, conflicting AWS node defaults, a leaked authoring TODO, and unrendered <image> placeholders on the Quickstart.
1. The default app URL contradicts itself — ryvn.app vs ryvn.run (critical)
Location: /docs/quickstart (step 6) vs /docs/networking/custom-domains
Problem: Quickstart says: "Every server installation on Ryvn is assigned a unique ryvn.app URL. Find this URL on the installation tile on the canvas." Custom Domains says the opposite: "By default, Ryvn environments use ryvn.run subdomains for public services… myapp.env-xyz.ryvn.run" and frames the whole page around replacing ryvn.run. Meanwhile ryvn.app is demonstrably Ryvn's own namespace — the control plane is control.ryvn.app, and the Kubernetes Labels page documents ryvn.app/service-name and ryvn.app/release-version as Ryvn's managed-resource label prefix.
Consequence: The single most important output of the Quickstart — "where is my app?" — points at the wrong domain. A developer (or an agent following the Quickstart) constructs or expects a *.ryvn.app hostname that doesn't serve their workload, and the DNS/TLS/custom-root-domain machinery on the Custom Domains page is all keyed to ryvn.run. The collision with Ryvn's control-plane and label namespace makes it worse: searching the docs for ryvn.app returns control-plane and label hits, reinforcing the wrong mental model.
The fix: Pick one public-domain string and use it everywhere. If installations are served on *.ryvn.run, correct Quickstart step 6 to say ryvn.run and show an example hostname. Reserve ryvn.app for the control plane / label prefix and say so explicitly in a "domains glossary" note.
2. The first IAM example grants ["*"] on "*" (critical)
Location: /docs/provision/aws (terraform_executor_policies example)
Problem: The terraform_executor_policies configuration is illustrated first with a policy granting actions: ["*"] on resource: "*" — unrestricted access to the cloud account.
Consequence: This is the example most developers (and agents) copy verbatim, because it's first and it "works." Shipping a full-admin executor policy into a customer's cloud account is exactly the over-permissioning a BYOC vendor's security review exists to catch — and Ryvn's entire pitch (per /why-byoc: "bypass traditional security review cycles," "starting POCs in days instead of months") depends on those reviews going smoothly. A wildcard executor policy is the first thing a customer's security team rejects, so the default example actively undermines the product's core value proposition.
The fix: Lead with a least-privilege example scoped to the services Ryvn actually provisions (EKS, EC2, VPC, and the IAM needed for those). If a permissive policy is sometimes necessary, move it below, mark it clearly, and explain the trade-off.
3. Every API endpoint is filed under "other" (significant)
Location: /docs/api.json (OpenAPI spec) → generated /docs/api-reference/*
Problem: Per the secrets-create evidence, every OpenAPI tag is named *_other with x-displayName: other (environment_other, service_other, org_other, …). The generated API reference therefore groups every operation under a single bucket labeled "other," with no resource taxonomy. The spec also mixes auth signals: a top-level global security: [] (no auth) while individual operations set BearerAuth.
Consequence: There is effectively no navigable structure to the API reference — a developer scanning for "secrets" or "environments" sees one undifferentiated "other" group. For AI agents, which rely on tag grouping and operation metadata to discover and route endpoints, the spec is close to unusable as a map, and the contradictory security: [] vs per-op BearerAuth makes it ambiguous whether a call needs a token.
The fix: Give each tag a real name and x-displayName (Secrets, Environments, Services, Organizations). Remove the global security: [] or set it to the actual default scheme so the spec doesn't advertise "no auth required" at the top level.
4. "Secret names must be unique" but no 409 is documented (significant)
Location: /docs/api-reference/secrets/create
Problem: The endpoint description states "Secret names must be unique within the organization," but the documented responses are only 201, 400, 401, and 404. There is no 409 Conflict (or any other) response documented for the duplicate-name case the description explicitly calls out.
Consequence: A developer writing a create-secret integration has no documented way to detect and handle the exact failure the docs warn about. Their error handling either treats the duplicate as a generic 400 (if that's what the API actually returns) or misses it entirely. Agents generating client code will not produce a branch for the named constraint because no status code maps to it.
The fix: Document the actual status returned on a duplicate name (almost certainly 409), with an example body, or — if the API silently upserts — state that explicitly and remove the "must be unique" language.
5. Server autoscaling: "HTTP traffic" vs CPU/memory triggers (significant)
Location: /docs/deploy/service-types/comparison vs /docs/configure/scaling
Problem: The service-type comparison table says Server scaling is "Auto-scaling based on HTTP traffic." The Scaling page documents the actual triggers as CPU and memory utilization (recommended targets 70-80% and 80-85%) with the formula new_replicas = ceil(current_replicas * (current_util / target_util)). No HTTP-request-count trigger appears anywhere in the scaling configuration.
Consequence: A developer reading the comparison table configures (or expects) request-rate-based scaling, then finds only CPU/memory utilization triggers when they open the scaling config. For a Server type — the most common workload — this is a direct mismatch between the "what scaling do I get" summary and the actual configurable behavior.
The fix: Change the comparison table to "Auto-scaling on CPU / memory utilization" (matching scaling.md), or, if request-rate scaling truly exists, document that trigger on the Scaling page.
6. AWS Basic Mode shows three conflicting sets of node defaults (significant)
Location: /docs/provision/aws
Problem: Three places on the same page disagree on instance defaults. The EKS defaults table says the application node group defaults to ["t3.medium"] and system to ["t3.xlarge"]. The Basic-Mode example YAML directly above sets both application and system to t3.large. The worked example further down uses t3.large (app) / t3.xlarge (system) with min/max values that also differ from the defaults table.
Consequence: A developer can't tell what they'll actually get if they accept defaults, and can't reconcile the "defaults" table with the example they're meant to copy. Capacity planning and cost estimates built off the wrong row are wrong.
The fix: Make the example YAML match the defaults table verbatim (or annotate the example as "overriding the defaults below, here's why"). State one canonical default per node group and reuse those exact values in every example on the page.
7. Unrendered <image> placeholder tags ship in the published Quickstart (significant)
Location: /docs/quickstart (step 1)
Problem: The published Quickstart contains a literal, unrendered authoring placeholder: <image>Screenshot showing the Create Service button in the Ryvn dashboard</image>. This is not a rendered image — it's the placeholder text where a screenshot was supposed to go, shipped as-is on the most important onboarding page. It's the same class of defect as the leaked TODO on the networking page (see below): an authoring artifact that reached production.
Consequence: A developer following the Quickstart sees raw markup instead of the screenshot the step references, on the page that forms their first impression of the product. Agents extracting the Quickstart parse a meaningless <image> tag where visual context should be. The first-run experience reads as unfinished.
The fix: Replace the <image> placeholders with real screenshots (or remove the tags). Add a lint/CI step that fails the build on unrendered <image>/placeholder tags in published MDX, alongside the TODO check recommended below.
8. The "How Ryvn works" overview promises "no Terraform configs" — but Terraform is a service type (significant)
Location: /docs/how-ryvn-works vs /docs/deploy/service-types/comparison and /docs/provision/aws
Problem: The conceptual overview states, under "Automated deployments": "No need to manage Docker images, write Terraform configs, or handle container orchestration." Yet the service-type comparison table lists Terraform as one of the four first-class service types ("Infrastructure provisioning"), and the AWS provisioning page requires authoring a terraform_executor_policies block — i.e., writing exactly that.
Consequence: A developer (or agent) forming a mental model from the overview internalizes "Ryvn means I never write Terraform," then hits a dedicated Terraform service type and Terraform-shaped provisioning config and has to unlearn it. The top-level page sets an expectation the reference docs immediately contradict.
The fix: Scope the claim to the automated GitHub-deploy path it's describing ("for code services, no need to write Terraform configs") and acknowledge the Terraform service type and provisioning config as the path where Terraform is involved.
9. FAQ presents two-replica HA as automatic; Scaling page says replicas can be set to zero (significant)
Location: /docs/support/faq vs /docs/configure/scaling
Problem: The FAQ's auto-scaling answer states Ryvn is "maintaining at least two replicas for high availability" as if it's an automatic guarantee. The Scaling page says the opposite: minReplicas can be "Set to 0 to allow scaling to zero when all triggers are idle," and the two-replica figure is only a best-practice recommendation ("For production servers, configure a minimum of two replicas… Consider your application's cold start time").
Consequence: A developer reading the FAQ believes two-replica HA is on by default and never sets minReplicas, leaving a service that can scale to zero (and incur cold starts) in production without realizing it. The FAQ describes a guarantee the platform doesn't actually provide unless you configure it.
The fix: Change the FAQ to "you can configure a minimum of two replicas for high availability (recommended for production); by default minReplicas is configurable down to zero." Keep the FAQ and Scaling page consistent on whether HA is a default or an opt-in.
10. Ryvn Agent page calls JWT + mTLS "multi-factor authentication" (significant)
Location: /docs/guides/ryvn-agent (Authentication section)
Problem: The agent security deep-dive states: "Multi-factor authentication secures agent-to-hub communication." The very next sentences describe what that actually is — "JWT tokens establish agent identity… TLS mutual authentication provides an additional security layer." That is machine-to-machine JWT + mTLS, not multi-factor authentication (which denotes multiple independent factors authenticating a user). The same page also alternates between "Ryvn Hub" and "Ryvn's control plane" for the same backend.
Consequence: This is the page a customer's security team reads when deciding whether to admit the agent into their cluster — the exact audience Ryvn's BYOC pitch is built around. Mislabeling the cryptographic auth model as "MFA" is precisely the imprecision a security reviewer flags, and it erodes confidence in the rest of the security claims. The Hub/control-plane drift forces the reader to guess whether two named components are one thing.
The fix: Describe the mechanism accurately ("agent-to-hub communication is secured with JWT-based agent identity and mutual TLS"). Reserve "multi-factor authentication" for user-facing flows if any exist. Standardize on one name — "control plane" or "Hub" — and define it once.
11. Orphan page: Kubernetes Labels exists and is linked but is absent from the docs index (significant)
Location: /docs/observability/kubernetes-labels vs /docs/llms.txt
Problem: observability/kubernetes-labels returns HTTP 200 and is linked from the Metrics page ("Ryvn applies standardized Kubernetes labels"), but it does not appear in llms.txt — the index that every page's header tells LLMs to fetch first, which lists only logs, metrics, and notifications under observability.
Consequence: llms.txt is the authoritative list agents use to enumerate docs. A page that's reachable by humans but missing from the index won't be crawled or surfaced by the very AI tooling Ryvn is courting with its Docs MCP and Agent Skills — so the label reference (needed to filter metrics in Prometheus/Datadog) is invisible to agents. Given Ryvn's explicit agent-first positioning, an un-indexed reference page is a real gap, not a cosmetic one.
The fix: Add observability/kubernetes-labels to llms.txt (and the rendered nav). Add a CI check that diffs the set of HTTP-200 doc pages against llms.txt so orphans are caught automatically.
12. Two different service-user creation flows, neither cross-references the other (significant)
Location: /docs/api-reference/introduction vs /docs/experimental/cli
Problem: The API Reference documents creating a service user via the dashboard (Settings > Service Users) and shows a hardcoded ZITADEL audience scope urn:zitadel:iam:org:project:id:298766811497774120:aud in every example. The CLI page documents a different path entirely: ryvn auth create service-user. Neither page mentions the other exists.
Consequence: A developer setting up programmatic auth can't tell whether the dashboard flow and the CLI command produce equivalent credentials, whether the CLI-created user needs the same hardcoded audience scope, or which is canonical. Agents will pick one arbitrarily and may generate a flow that's incomplete (e.g., CLI user without the audience scope the API examples require).
The fix: Cross-link the two flows and state they're equivalent (or document the differences). Explain where the :298766811497774120:aud scope comes from and whether it's account-specific or global.
13. A leaked authoring TODO ships in the published page — twice (significant)
Location: /docs/networking/migration-guides/ingress-nginx-retirement
Problem: The published markdown contains an internal authoring note inside two table cells (the allowlist and denylist annotation rows): {/* TODO: document client-IP detection limitation (source.ipBlocks vs source.remoteIpBlocks) once behavior is stable */}. This is the most detailed networking page in the docs, and it also lists large sets of rejected annotations (auth, mTLS, WAF, snippets) as "coming soon" with no ETA.
Consequence: The TODO confirms there's a known, undocumented client-IP-detection limitation affecting IP allowlist/denylist behavior — a security-relevant feature — and the docs explicitly admit it isn't documented "once behavior is stable." A developer relying on whitelist-source-range to block traffic has no way to know the limitation exists. The leaked comment also signals to agents that the page is mid-edit.
The fix: Remove both TODO comments and either document the ipBlocks vs remoteIpBlocks client-IP behavior now or add a visible caveat ("client-IP detection behind a proxy may differ; contact support"). Add a lint step that fails the build on {/* TODO/TODO in published MDX.
14. FAQ never states the non-enterprise support hours, so 24/7/SLA reads as universal (minor)
Location: /docs/support/faq vs /docs/support/contact-us
Problem: The FAQ's support answer does open by scoping the premium tier — "Enterprise customers get the white-glove treatment. Our senior engineers are available 24/7 with 15-minute response times… back it all with a 99.99% uptime SLA." So the answer is explicitly framed as Enterprise. But it never mentions the community/email support tier the Contact Us page describes ("Available daily from 6am-8pm PT / 9am-11pm ET"). A reader skimming the FAQ sees only the Enterprise offering and may assume 24/7 / 15-minute / 99.99% is what everyone gets, with no indication a more limited tier is the default.
Consequence: A non-enterprise reader can come away thinking round-the-clock 15-minute support is included, when their actual coverage is 6am–8pm PT. It's a clarity gap, not a contradiction — the FAQ scopes correctly to Enterprise but omits the contrasting community tier.
The fix: Add one line to the FAQ answer naming the community tier and its hours ("Community and email support is available daily 6am–8pm PT / 9am–11pm ET; 24/7 / 15-minute response / 99.99% SLA is for Enterprise customers"). Keep the numbers identical across both pages.
15. Service types are named multiple ways across pages (minor)
Location: /docs/deploy/service-types/comparison and /docs/quickstart vs /docs/networking/custom-domains
Problem: The third service type is called "Chart" in the comparison table and Quickstart table, but "Helm chart services" on the Custom Domains page ("Helm chart services are configured in your chart's ingress values"). The word "service" is itself overloaded: it's the generic noun and one of the four types is literally "Server services" (a service of type Server).
Consequence: A developer searching for "Helm" finds nothing in the comparison page that calls it "Chart," and vice versa. Agents treating "Chart" and "Helm chart service" as distinct concepts will mis-route configuration questions.
The fix: Choose one canonical label (e.g., "Chart (Helm)") and use it everywhere, with the alternate as a one-time parenthetical. Define "service" vs "service type" once in the glossary.
16. "experimental" section vs "beta" banners (minor)
Location: /docs/experimental/cli, /docs/experimental/docs-mcp, /docs/experimental/agent-skills
Problem: All three pages live under the nav section named experimental, but each page's own warning banner calls the feature beta ("The Ryvn CLI is in beta", "The Docs MCP server is in beta", "Agent Skills are in beta").
Consequence: "Experimental" and "beta" imply different stability and support expectations. A developer deciding whether to depend on the CLI in CI can't tell which label governs. The mismatch is consistent across three pages, so it reads as a deliberate-but-unreconciled naming split.
The fix: Standardize on one maturity term for these features and rename the nav section or the banners to match.
17. "Custom annotations" section is flagged Deprecated yet still teaches nginx annotations (minor)
Location: /docs/networking/custom-domains (Advanced configuration → Custom annotations)
Problem: The "Custom annotations" subsection carries a Deprecated warning ("Some nginx-specific annotations are not supported") and points to the ingress-nginx retirement guide — but immediately below the warning it still instructs the reader: "You can add nginx annotations for routing control:". The page deprecates the mechanism and demonstrates it in the same breath.
Consequence: A developer can't tell whether they're supposed to use nginx annotations or avoid them. They may build routing on a mechanism the docs simultaneously label deprecated, then have it break when the nginx retirement (documented elsewhere) lands.
The fix: Either remove the demonstration and replace it with the supported path, or clearly mark the demonstrated annotations as "legacy, being retired — use X instead" with a migration pointer rather than presenting them as a current option.
18. Ingress-retirement date contradicts its own source link (minor)
Location: /docs/networking/migration-guides/ingress-nginx-retirement
Problem: The prose says the Kubernetes community "retired ingress-nginx in March 2026" (past tense), but both inline links point to a blog post dated kubernetes.io/blog/2025/11/11/ingress-nginx-retirement/ (November 2025). The page was fetched 2026-06-09.
Consequence: A developer planning their migration around the retirement date gets a date that disagrees with the cited source by ~four months. It's a small thing, but it's on the one page that's supposed to be the authoritative migration timeline.
The fix: Reconcile the date with the linked source (use the actual EOL/retirement date from the blog) or cite the specific milestone the "March 2026" claim refers to.
19. Self-referential anchor link in the deployment guide (minor)
Location: /docs/deploy/how-deploying-works (Zero-downtime deployments section)
Problem: Inside the ## Zero-downtime deployments section, the text reads "Learn more about zero-downtime deployments" — the link points to the heading it's already under, so clicking it does nothing.
Consequence: A reader expecting a deeper explanation clicks a link that scrolls them to where they already are. It signals a broken cross-reference (likely meant to point to another page) that was never completed.
The fix: Point the link at the intended deeper resource, or remove it if this section is itself the canonical explanation.
20. Enterprise consultation CTA points to a personal Cal.com link (minor)
Location: /docs/support/contact-us
Problem: The enterprise CTA — "Schedule a consultation" — links to an individual's personal Cal.com scheduling page rather than a company-owned booking URL.
Consequence: Enterprise buyers (the highest-value readers of this page) are routed to a personal calendar that breaks if that individual changes roles or leaves, with no company fallback. It also reads as less established than the enterprise framing around it.
The fix: Use a team/company-owned scheduling link (e.g., a cal.com/ryvn/... or ryvn.ai/demo redirect) so the booking survives staffing changes.
What they do well
- Genuinely agent-aware surface area — a published
llms.txtindex, a Docs MCP (search_ryvn), and installable Agent Skills put Ryvn ahead of most dev-tools docs on machine readability, even where the contents need cleanup. - Concrete operational numbers where it counts — build/pre-deploy/deploy timeouts (120/30/15 min), 30-day metrics retention, collector footprint (<1% CPU / <100MB), and the explicit autoscaling formula give developers real values to plan against.
- Clear core data model — the service-as-template / installation-as-running-instance distinction is defined once and held consistently across the Services and Installations pages.
Top 3 recommendations
- Fix the
ryvn.appvsryvn.runcontradiction first — it's the answer to "where is my app," it's wrong in the Quickstart, and it collides with Ryvn's own control-plane/label namespace. One canonical public-domain string everywhere. - De-risk the copy-paste hazards in your security-sensitive surface — replace the wildcard
actions:["*"] resource:"*"IAM example with a least-privilege default, stop calling JWT + mTLS "multi-factor authentication" on the agent page, and remove the leaked TODO and<image>placeholders. These are the things a customer's security review and an onboarding developer hit first. - Reconcile the claims that disagree across pages — the overview's "no Terraform configs" vs the Terraform service type, FAQ "two replicas for HA" vs
minReplicas: 0, Server scaling "HTTP traffic" vs CPU/memory, the three AWS node-default rows, and the missing duplicate-name (409) response. Add CI checks for orphan pages (diff HTTP-200 docs againstllms.txt) and strayTODO/<image>artifacts so these don't reappear.