Rootly Documentation Audit
The docs are broad and reasonably organized, with a real llms.txt, a downloadable OpenAPI spec, and an exemplary MCP Server page — but the API surfaces are the weakest spots: a 1000× rate-limit contradiction on writes, five [DEPRECATED] resource groups with no migration path, status enums that exceed the lifecycle prose, copy-paste stragglers in the sidebar, and a Status Page API that's a completely separate surface but isn't introduced as one.
1. Rate limit for write requests contradicts itself by 1000x (critical)
Location: /api-reference/overview vs /api-reference/incidents/creates-an-incident
Problem: The API Reference overview states: "There is a default limit of 3000 POST, PUT, PATCH or DELETE calls per API key every minute." The Create-an-Incident endpoint page states: "Rate Limiting: POST requests are limited to 3 calls per 60 seconds per API key." One says 3000/min, the other 3/min. The endpoint page never says "incident-creation-specific" — it reads as the rate limit for that POST.
Consequence: An integrator sizing their incident-creation throughput has no way to know which number is real. If 3/min is correct, anyone reading the overview will blow past it during a production fire-drill and silently start getting 429s. If 3000/min is correct, the endpoint page is needlessly scary. Either way, an agent generating code from these docs will pick whichever it saw last and ship a bug.
The fix: Pick one. If incident-creation truly has a stricter per-endpoint cap, name it explicitly ("incidents POST: 3/min; all other writes: 3000/min") and link both pages to a single canonical rate-limit table that lists every per-resource override (incidents, alerts, etc.).
2. Five deprecated API resource groups with no migration guidance (critical)
Location: /api-reference/overview (sidebar)
Problem: Five resource groups are marked [DEPRECATED] with every endpoint deprecated and no replacement linked:
CatalogEntityPropertiesCustomFieldOptionsCustomFieldsWorkflowCustomFieldSelectionsIncidentCustomFieldSelections
Nothing in the API overview, the Catalog pages, or the Forms & Fields configuration page explains what replaces them. The presence of CatalogProperties (alias for fields) suggests fields is the new surface, but no migration doc says so.
Consequence: Customers on legacy integrations don't know whether they need to migrate, when these endpoints will be removed, or what to migrate to. Terraform module authors face the same question for the corresponding resources. Agents generating code from the OpenAPI spec will happily emit calls against deprecated endpoints because nothing flags them as "do not use for new code." This is the kind of silent decay that ships to production and only gets caught when removal day arrives.
The fix: For each deprecated group, add a Deprecated: use <X> instead line on the resource page, link to the replacement, and publish a single migration page covering all five. Include a removal date or removal-policy statement.
3. API Reference contains literal typos and copy-paste straggler endpoints (significant)
Location: /api-reference/overview (sidebar)
Problem: Multiple parsing-hostile artifacts are visible in the rendered API reference:
IncidentEventServices: a real DELETE endpoint titled"Delete an incident event functionalitu"(literal typo: "functionalitu").IncidentEventFunctionalities: the DELETE is titled"Delete an incident event functionality"but the corresponding UPDATE is titled"Update an incident event"(no "functionality") — verb-target mismatch within a single resource group.- Under
Causes,Environments, andFunctionalities, the endpoints"GET List Catalog Properties"and"POST Creates a Catalog Property (alias for field)"appear as stragglers — they belong toCatalogPropertiesand have been duplicated into unrelated resource groups in the nav. CatalogPropertiesis itself documented as "alias for fields" on every operation, with no canonical page explaining whetherpropertiesandfieldsare interchangeable or one is preferred.
Consequence: Agents indexing the API reference will surface non-existent endpoints (the wrong-named delete) and stop trusting endpoint titles. Humans clicking through Causes see "Catalog Properties" endpoints with no explanation and assume the page is broken. The properties/fields aliasing makes it impossible to know which name to use in code without trial and error.
The fix: Audit the OpenAPI spec for the source of these strays (they look like operation-tag mis-assignments). Fix the functionalitu typo and the Update an incident event mismatch. Pick one canonical name between catalog_properties and catalog_fields, document the alias once at the resource level, and stop repeating "(alias for field)" on every operation.
4. Incident status enum accepts states the lifecycle page never names (significant)
Location: /incidents/incident-lifecycle vs /api-reference/incidents/creates-an-incident
Problem: The Incident Lifecycle page documents these stages with explicit Data value: labels: in_triage, started, detected, acknowledged, mitigated, resolved, closed, cancelled, plus a Planned Maintenance (Optional) stage whose data value is not captured in scraped evidence. The POST /v1/incidents schema accepts: in_triage | started | detected | acknowledged | mitigated | resolved | closed | cancelled | scheduled | in_progress | completed.
scheduled, in_progress, and completed are accepted by the API but never described as lifecycle stages in the lifecycle prose. The Lifecycle page's "Planned Maintenance" stage cannot be matched to a value in the POST status enum without inference.
Consequence: A developer building a status-board or automation has no way to know what in_progress or completed mean, when they fire, or how they relate to started/closed. An agent generating an incident-status state machine from these docs will produce something the API accepts but that the UI may display unpredictably.
The fix: Add the scheduled/in_progress/completed states to the lifecycle page (these appear to belong to kind: scheduled incidents — say so). Either expose the data value for the Planned Maintenance stage in the lifecycle doc or, if it maps to scheduled, note the alias inline.
5. The kind enum exposes _sub variants that no page explains (significant)
Location: /api-reference/incidents/creates-an-incident
Problem: The POST /v1/incidents kind enum accepts: test | test_sub | example | example_sub | normal | normal_sub | backfilled | scheduled | scheduled_sub. Five of these nine values are _sub variants, and the scraped evidence shows no prose anywhere that defines what _sub means, when to send it, or how it differs from the base value. The lifecycle page never mentions sub-incidents. The Slack incident-creation guide never mentions them.
Consequence: An integrator (or an agent generating an SDK wrapper from the OpenAPI spec) can see that normal_sub is accepted but has zero documentation guidance on whether to ever send it, what it does, or what the side effects are. The result is "valid input that no one ever describes" — exactly the kind of enum that produces production bugs months after deployment.
The fix: Add a section to the Incident Lifecycle or Incident Structure pages explaining the parent/sub-incident concept and which kind values exist for each. If _sub is internal-only or legacy, mark those values deprecated in the OpenAPI spec instead of exposing them as public surface.
6. Status Page API is a separate surface with different host, IDs, and auth — but isn't introduced as one (significant)
Location: /api-reference/statuspages/list-active-incidents-for-a-status-page vs /api-reference/overview
Problem: The main API uses https://api.rootly.com/v1/..., integer-ish IDs, JSON:API media type, and Authorization: Bearer YOUR-TOKEN. The Status Page API endpoint is hosted at https://status.example.com/api/v1/incidents.json, returns UUIDs like 3c90c3cc-0d44-4b50-8888-8dd25736052a, uses plain JSON (not JSON:API), and the cURL example shows no Authorization header at all:
curl --request GET --url https://status.example.com/api/v1/incidents.json
There is no overview page introducing the Status Page API as a distinct surface, no statement about whether it's public or authenticated, no explanation of how the status.example.com placeholder maps to a customer's actual status page subdomain, and no cross-reference from the main API overview.
Consequence: An integrator who finds the Status Page API tab assumes it shares auth/conventions with the main API and writes broken code (wrong base URL pattern, wrong content type, wrong ID type). An agent that has been told "use the Rootly API" cannot tell from the docs that there are two APIs. Worse, if the Status Page endpoint is in fact public (no auth needed), it should say so explicitly — silent omission of an Authorization header is a security-documentation gap.
The fix: Add a Status Page API overview that mirrors /api-reference/overview: base URL pattern (customer subdomain substitution), auth model, ID scheme, content type, rate limits, error format. Cross-link from the main API overview. If the endpoint is public, label it Public: no authentication required; if it isn't, fix the cURL example.
7. 429 response shape isn't shown on endpoint pages (significant)
Location: /api-reference/incidents/creates-an-incident vs /api-reference/overview
Problem: The API overview documents that 429 responses return {"error": "Rate limit exceeded. Try again later."}. The Create-an-Incident endpoint page lists response codes 201 | 401 | 422 and never mentions 429, despite this same page being where the contradictory "3 per 60 seconds" rate-limit text lives. So the page that's most likely to throw a 429 is also the page that never shows you what the 429 body looks like.
Consequence: An integrator writing retry logic against the endpoint page has no way to know what to parse on 429 without bouncing to the overview. Agents auto-generating typed clients from per-endpoint response schemas will produce response unions that don't include the 429 shape.
The fix: Add 429 to every endpoint page's response-code table (or generate the response-code table from the spec so this is impossible to forget). Link to the canonical rate-limit/error section once.
8. Four different breadcrumb paths to generate an API key (significant)
Location: /api-reference/overview, /integrations/mcp-server, /integrations/terraform, /integrations/cli
Problem: The same action — "create an API key" — is described four different ways across four pages:
- API Reference: "Organization dropdown > Organization Settings > API Keys > Generate New API Key"
- MCP Server: "Account > Manage API keys > Generate New API Key"
- Terraform: "Generate one from Account > API Tokens in your Rootly dashboard."
- CLI: "Navigate to Settings > API Keys"
In addition, the same artifact is called "API Key" in three places and "API Token" in Terraform — and the CLI env var is ROOTLY_API_TOKEN while the section heading is "Getting an API Key."
Consequence: A new developer following any one of these guides will not find the screen the others describe and will assume the docs are stale. Agents that have to disambiguate "API token" vs "API key" between Terraform and MCP examples can't tell whether they're the same thing.
The fix: Standardize on one term (API key or API token, not both) and one canonical breadcrumb. Replace the four inline instructions with a transclude/snippet that includes a screenshot, then update all integration pages to reference it.
9. Alert-creation rate limit is phrased differently on the two pages that state it (significant)
Location: /alerts/alerts vs /api-reference/overview
Problem: The Alerts overview says: "Alert ingestion is rate limited to 50 alerts per minute per source/API key by default." The API overview restates: "Alert creation is limited to 50 per minute per API key. … Note: Our default rate limit for Alert Creation is 50 alerts every minute, per API key or alert source." The two phrasings differ: "per source/API key" vs "per API key or alert source." It is not clear whether the limit is per (source × key), per source, per key, or the more restrictive of the two.
Consequence: A team running multiple alert sources through one shared API key cannot tell whether they will hit 50/min total or 50/min/source. This is the exact number you need to know when sizing an integration, and getting it wrong means dropped alerts at 3am.
The fix: State the limit unambiguously in one place with a concrete example ("if you send from 3 alert sources using 1 API key, you may send up to N total / N per source per minute"). Reference that one paragraph from both the alerts overview and the API overview.
10. Canonical-looking URL slugs return 404 (significant)
Location: Site-wide URL scheme
Problem: Two intuitive slug guesses resolve only at oddly named canonical paths:
/quick-startreturns 404; the canonical lives at/quick-start-guide./ai/data-privacyreturns 404; the canonical lives at/ai/data-privacy-for-ai.
The docs home itself is at /help-and-documentation and the root path redirects to it. Other obvious slug guesses (/api-reference, /getting-started/quick-start) appear to follow the same pattern but were not independently captured as 404s in the evidence.
Consequence: External blog posts, internal wikis, and AI agents that link to these intuitive slugs all land on 404 pages. The page slug /ai/data-privacy-for-ai is especially unfortunate because tooling that links to "data privacy" from an AI context will guess /ai/data-privacy first.
The fix: Add HTTP 301 redirects for the obvious aliases (/quick-start → /quick-start-guide, /ai/data-privacy → /ai/data-privacy-for-ai, and any others that intuitively shorten). Mintlify supports redirects in docs.json; this is one config change.
11. Status Pages configuration page is one paragraph for a feature it tells you to set up in a minute (significant)
Location: /configuration/status-pages
Problem: The entire body of the page reads: "Status pages allow you to quickly communicate information about the health of your services and applications… It only takes about a minute to set up a status page, so we encourage you to do this soon after you've signed up." That's it. No screenshot, no form fields, no link to a setup guide, no API reference for creating a status page programmatically. The "Previous" link at the bottom references Public And Private Status Pages, but the page itself doesn't deep-link there.
Consequence: A user arriving here from the sidebar (which advertises this page under Communication & Notifications → Status Pages) gets no setup instructions and has to back out and hunt. Agents asked "how do I set up a Rootly status page" against this page can produce nothing useful.
The fix: Either delete this stub and redirect to the real status-page setup guide, or fill it in with the actual minute-long flow (create page → add components → publish), with links to the Public/Private variant and the Status Page API.
12. Alert status enum is four values with no prose distinguishing open from triggered (minor)
Location: /alerts/alerts
Problem: The Alerts page lists four alert statuses — open | triggered | acknowledged | resolved — without prose explaining the difference between open and triggered. acknowledged and resolved are intuitive; open vs triggered is not.
Consequence: An integrator writing alert-handling automation cannot tell when an alert is open but not triggered, or whether open is a pre-routing state. Agents writing alert state machines from this page will guess at the difference and may produce subtly wrong transitions.
The fix: Add a one-line definition for each status (and a transition diagram for the four-state machine). If one of open/triggered is legacy or rarely-used, say so.
13. CLI is "Early Preview" on the page but has no preview indicator in the sidebar (minor)
Location: /integrations/cli
Problem: The CLI page opens with: "Early Preview — The Rootly CLI is currently in early preview. Features may change and some functionality may be limited." But in the Integrations sidebar, "CLI" sits next to Terraform, Pulumi, and the MCP Server with no beta/preview tag — and the page doesn't say which subcommands are stable vs. unstable.
Consequence: A developer evaluating Rootly's automation surface from the sidebar can't tell at a glance which integrations are production-ready. They may pick the CLI for CI/CD and discover later that some commands change shape.
The fix: Add a Preview badge in the sidebar (Mintlify supports tag: "Preview" on docs.json entries) and either enumerate which subcommand families are stable or set the expectation that all of it may change.
14. Integration count disagrees between the home page and llms.txt (minor)
Location: /help-and-documentation vs /llms.txt
Problem: The Welcome page promises "All Integrations — Browse our complete library of 50+ integrations." The llms.txt summary says Rootly "integrates with 60+ tools."
Consequence: Trivial for humans, but an AI agent ingesting both pages now has two facts about integration count and no way to know which is current. The llms.txt is the file specifically built to be authoritative for agents — having it disagree with the home page is exactly the kind of contradiction it's supposed to eliminate.
The fix: Pick one number, update both. Better: drop the marketing-style count from llms.txt and link to the integrations index, which is the source of truth.
What they do well
- Real llms.txt advertised on every page — every doc page repeats the location of the documentation index. That's an unusually strong agent affordance, even if the boilerplate is heavy.
- MCP Server page is exemplary — covers hosted vs. local, every major client (Claude Code, Cursor, Windsurf, Codex, Claude Desktop, Gemini CLI), token scopes, and the specific tools the server adds beyond raw OpenAPI.
- Terraform integration page is concrete — provider source, version pin, env vars, importing existing resources, end-to-end workflow example with Liquid templating. The kind of page a senior platform engineer can ship from.
Top 3 recommendations
- Fix the rate-limit contradiction first. Either reconcile 3000/min vs 3/min, or, if both are right, put a per-endpoint rate-limit table on the API overview and link every endpoint page back to it. While you're there, make 429 part of every endpoint's response table.
- Publish a deprecation & migration page. Five
[DEPRECATED]resource groups with no replacement linked is a ticking-clock liability for any customer integration. One canonical migration page, plus ause <X> insteadline on each deprecated resource, eliminates this entire class of risk. - Treat the Status Page API as a first-class surface. Add an overview page with base URL, auth model, ID scheme, and content type — and either label the existing cURL example public or add the missing
Authorizationheader. Also audit the OpenAPI-derived main sidebar: strip "Catalog Properties" stragglers out ofCauses/Environments/Functionalities, fix thefunctionalitutypo, and decide whetherpropertiesorfieldsis canonical.