SiteGPT Documentation Audit
The docs cover a broad surface (dashboard setup, integrations, CLI, MCP, v0/v2 APIs, SDK, webhooks) and ship both an llms.txt and an OpenAPI spec — but they are riddled with cross-page contradictions, a stale changelog index, undocumented data sources, and at least one outright security smell in the SDK example.
1. Auto-sync plan gating contradicts itself across two setup pages (critical)
Location: /docs/setup/training-your-chatbot vs /docs/setup/retraining-and-updating-your-ai-chatbot
Problem: The training page restricts auto-sync frequencies by plan tier: "Daily (Enterprise)… Weekly (Scale)… Monthly (Growth)". The retraining page lists the same four options ("Never (manual only) / Daily / Weekly / Monthly") with no plan gating mentioned at all.
Consequence: A user on the Growth plan reading the retraining page will believe daily sync is available, configure it, and then either silently fail or hit an unexplained upgrade prompt. An agent scraping both pages will surface contradictory answers for the exact same setting.
The fix: Pick one source of truth for the plan-to-frequency mapping (ideally the pricing page) and link both setup pages to it. Annotate the dropdown options identically on both pages.
2. Data Sources overview lists SharePoint / Confluence / GitBook that nothing else documents (critical)
Location: /docs/data-sources/overview vs /docs/setup/create-new-chatbot and /docs/setup/training-your-chatbot
Problem: The Data Sources overview lists "SharePoint — Import content from Microsoft SharePoint sites", "Confluence — Connect Atlassian Confluence for internal documentation", and "GitBook — Import documentation from GitBook" as available sources. Neither the create-new-chatbot page (which enumerates Local Files / Notion / Google Drive / Dropbox / OneDrive / Box / Multiple Links / Sitemap / Scrape Website / YouTube) nor the training-your-chatbot page mentions SharePoint, Confluence, or GitBook at all. The docs llms.txt summary echoes "Notion, Google Drive, Confluence, and GitHub" — yet another different list.
Consequence: A buyer evaluates SiteGPT on the promise of Confluence/SharePoint/GitBook ingestion, then finds no setup instructions, no auth flow, no file-type matrix, and no confirmation those integrations even ship. Sales-blocking and credibility-damaging.
The fix: Either ship the missing source guides (each with auth + content selection + sync behavior) or remove the entries from the data-sources overview. Reconcile the llms.txt summary against the dashboard's actual integration catalog.
3. Two API versions with completely different response envelopes and no migration guide (critical)
Location: /docs/api-reference/getting-started (v0) vs /docs/api-reference/v2/getting-started and /docs/api-reference/v2/authentication
Problem: The legacy v0 API uses a success / message / data / error envelope and is keyed off plan-tier rate limits; the v2 API uses an entirely different ok + data / error + meta.requestId envelope and sgpt_xxxxxxxxx bearer tokens. The v0 page tells readers "use API Reference → v2 from the version selector" but there is no migration guide explaining (a) how to translate v0 client code to v2, (b) whether v0 will be deprecated, (c) which endpoints exist in both versions, or (d) how v0 tokens relate to v2 scoped tokens.
Consequence: Existing v0 integrators have no upgrade path. New integrators landing on the v0 page have to guess whether to start there or jump versions. AI agents indexing both will surface mutually contradictory request/response shapes for what they assume is "the SiteGPT API."
The fix: Publish a v0→v2 migration guide with side-by-side envelope examples, an endpoint mapping table, a token/scope translation, and a stated deprecation timeline for v0. Add a banner on every v0 endpoint page pointing to the v2 equivalent.
4. v2 API has no documented rate limits (critical)
Location: /docs/api-reference/v2/getting-started and /docs/api-reference/v2/authentication
Problem: The legacy v0 getting-started page documents rate limits per plan. The v2 getting-started and authentication pages contain no rate-limit information at all — no requests-per-minute, no headers, no 429 behavior, no scope-based throttling.
Consequence: Any production integrator using the recommended v2 API has no idea what limit they're approaching, can't implement backoff intelligently, and will discover throttling only by tripping it. CLI/MCP/agent automations (which SiteGPT explicitly markets) are exactly the workloads most likely to hit unannounced limits.
The fix: Publish a dedicated api-reference/v2/rate-limits page covering the limit values, response headers (X-RateLimit-* or equivalent), 429 body shape, and any per-scope or per-token differences.
5. SDK example ships a hardcoded API key with contradictory "keep it private" warning (critical)
Location: /docs/developers/sdk
Problem: The JavaScript SDK page shows an example with an inline API key ('6338d1ac-9afc-4a7c-8812-ba9e3f8c48eb') annotated with "keep it private!" — but the snippet is presented as the canonical client-side embed. A real client-side <script> tag publishes that key to every browser that loads the page.
Consequence: Developers copy-paste the snippet verbatim into production sites and unknowingly expose what the docs themselves call a private credential. Agents extracting "the SDK example" inherit the same bad pattern. Either the key is not actually sensitive (in which case the warning is wrong) or it is (in which case the example is wrong).
The fix: Replace the literal UUID with a clearly marked placeholder (<YOUR_PUBLIC_CHATBOT_ID> or similar), explicitly state what kind of credential it is (chatbot identifier vs API key vs signed token), and remove the "keep it private" warning if it's a public embed ID. If signatures are required for sensitive operations, lead with the signed-request example, not the unsigned one.
6. Changelog overview is stale — January 2026 entry exists but isn't listed (significant)
Location: /docs/changelog/overview vs /docs/changelog/2026-january
Problem: The changelog overview's "Recent updates" section caps out at October 2025, and the "All updates by month" list ends at "October 2025 — Pages quota system and SDK improvements". But /docs/changelog/2026-january exists, with new features (editable conversation titles, session ID/IP tracking, disclaimer max height) and bug fixes. November 2025, December 2025, and the 2026 section are entirely absent from the index.
Consequence: Customers checking "what's new?" believe the product hasn't shipped in 8+ months. Agents indexing the changelog miss three months of feature deltas. The omission undermines trust in the rest of the docs as an accurate source of current behavior.
The fix: Auto-generate the changelog index from the underlying month pages so new entries appear automatically. At minimum, backfill November 2025, December 2025, and January 2026 into both the "Recent updates" and "All updates by month" sections.
7. Top-level llms.txt doesn't link to any doc pages (significant)
Location: https://sitegpt.ai/llms.txt
Problem: The root llms.txt advertises only the marketing/agent surfaces (/agents, /cli, /claude-code, /cursor, /codex, /openclaw, /hermes, the npm package, auth.md, the agent JSON manifest, the OpenAPI spec, and the MCP server-card). It contains zero links into /docs/* — not the introduction, not the API reference, not the CLI guide, not data sources. The docs do publish a separate /docs/llms.txt, but the root file (which is what most agent crawlers look for) doesn't reference it.
Consequence: An agent following the standard llms.txt convention from the apex domain never discovers the actual product documentation. It will see the CLI npm package and an auth doc, and conclude that's the documentation surface. SiteGPT shipped the file but left the most-discoverable index pointing at the wrong tree.
The fix: Either consolidate into a single root llms.txt that links to canonical doc URLs, or have the root file explicitly link to /docs/llms.txt as the documentation index. Include direct links to the OpenAPI spec, the v2 getting-started, the CLI overview, and the integrations catalog.
8. v0 endpoint response schemas are effectively empty (significant)
Location: /docs/api-reference/chatbot/create-chatbot, /docs/api-reference/chatbot-threads/update-thread
Problem: The "Create Chatbot" v0 endpoint documents its response as data: object — Data returned from the server with no field listing. The "Update Thread" page sample returns "data": null while the schema still claims data is an object "Data returned from the server".
Consequence: A developer creating a chatbot via the v0 API has no documented way to know which field carries the new chatbot's ID, which they need for every subsequent call. They must inspect the live response and guess. Agents generating client code from these pages will emit broken types.
The fix: Document each endpoint's actual response shape — at minimum the keys inside data, their types, and which ones are required vs nullable. Reconcile sample responses (data: null) with declared schemas, and either deprecate the v0 reference outright or bring it up to v2 quality.
9. Webhook payload references retired gpt-3.5-turbo (significant)
Location: /docs/developers/webhooks
Problem: The Messages webhook payload example shows gptModel: "gpt-3.5-turbo". The rest of the docs (and the platform's marketing) reference current models. There is no note that the example is illustrative or that the field reports whatever model the chatbot was configured with.
Consequence: Integrators building model-aware routing or analytics will hardcode gpt-3.5-turbo handling, and agents generating downstream schemas will treat that as a canonical enum value. If the platform no longer serves 3.5-turbo, the example is misleading; if it does, that's a separate disclosure issue.
The fix: Use a current model identifier in the example or replace the value with a clearly-marked placeholder (<gpt_model_identifier>). Add a sentence stating that the field echoes the chatbot's configured model and listing the possible values (or pointing to where the enum is defined).
10. auth.md ships with Markdown-escape corruption in code samples (significant)
Location: https://sitegpt.ai/auth.md
Problem: The agent-auth document contains backslash-escaped underscores in identifiers that must be used literally — workspace\_id, agent\_client\_id, VALIDATION\_FAILED, the flow name agent\_verified. These are real field names and error codes, not markdown emphasis.
Consequence: An agent (or a human) copying these into code will paste literal backslashes, producing workspace\_id keys that the API will reject. Because the file is explicitly the "agent auth" doc that AI agents are meant to consume, the escaping bug breaks exactly the workflow the file was created for.
The fix: Stop running the source through a Markdown emphasis escaper for .md files that AI agents fetch raw. Serve auth.md as authored, with underscores intact. Add a CI check that fails if backslash-underscore sequences appear in API field names.
11. CLI profile docs reference a usage scope not in the v2 scopes catalog (minor)
Location: /docs/cli/profiles vs /docs/api-reference/v2/authentication
Problem: The CLI profiles page suggests an example scope set that includes usage described as "access through the account token", but the v2 authentication page's scopes table (the canonical source) doesn't list a usage scope.
Consequence: Users copying the suggested profile scopes will see token-creation errors or be confused about what permissions they actually have. Agents reconciling the two pages will hit a dead reference.
The fix: Either add usage to the v2 scopes table with a real description, or correct the CLI profiles example to use scope names that exist. Cross-link both pages so scope lists stay in sync.
12. Zendesk "Legacy" integration has no migration path (minor)
Location: /docs/integrations/overview, /docs/integrations/zendesk
Problem: The integrations catalog lists both "Zendesk" (one-click OAuth) and "Zendesk (Legacy)" (API-key-based). The new Zendesk guide says "If you already have the older API-key-based Zendesk integration installed, see Zendesk (Legacy)" but provides no migration steps from Legacy → new, no statement of whether Legacy is deprecated, and no compatibility notes (e.g., do conversations carry over, are tickets re-mapped).
Consequence: Existing Legacy customers don't know whether to migrate, how, or what breaks if they do. New customers reading both cards aren't told which to pick beyond "for new installs, follow this guide".
The fix: Add a short migration section to the Zendesk page covering: who should migrate, what data carries over, downtime expectations, and a clear deprecation date (or explicit "Legacy is supported indefinitely") for the API-key flow.
13. Zendesk prerequisite excludes lower plans with no fallback guidance (minor)
Location: /docs/integrations/zendesk
Problem: The prerequisite states "Zendesk Suite Professional plan or above, with Messaging and Sunshine Conversations access". Customers on Team or Foundational plans get no guidance — no mention of whether the Legacy API-key integration works for them, no alternative chat-channel suggestion.
Consequence: A meaningful slice of Zendesk customers (Team/Foundational) hit a hard wall on the integration page and don't know there's a Legacy path that may still work.
The fix: Add an inline note: "On a lower Zendesk plan? The Legacy integration supports …" or explicitly state that only Professional+ is supported and link to the upgrade path.
14. Web embed snippet appends script to <head> while example sits before </body> (minor)
Location: /docs/setup/integrating-with-your-website
Problem: The provided embed example uses a script tag with s.async=1 that the docs say is injected into <head> even though the snippet is shown sitting before </body>. The visible code and the described DOM behavior don't match.
Consequence: Developers debugging load order, CSP, or async behavior will be misled about where the script actually lands. Platform-specific tutorials (WordPress, Shopify, Wix, Webflow, Squarespace) inherit the same ambiguity.
The fix: Either show a single canonical placement and remove conflicting guidance, or split into "place in <head>" vs "place before </body>" variants with explicit behavioral differences.
What they do well
- Both a top-level
llms.txtand a/docs/llms.txtexist, plus a public OpenAPI spec at/api/v2/openapi.jsonand a.well-known/sitegpt-agent.jsonmanifest — the agent-discoverability scaffolding is present even if some of it points the wrong way. - The Zendesk integration page is unusually candid about edge cases (third-party channels can't render forms, "Solved → Closed" trigger pitfalls, 20-field Zendesk cap) — that's high-quality production-grade documentation.
- A first-party CLI with profiles, a
--jsonmode, and a hosted MCP server with OAuth is a serious commitment to agent-driven workflows.
Top 3 recommendations
- Reconcile the data-source catalog end to end. Resolve the SharePoint/Confluence/GitBook claim against the actual product, and unify the auto-sync plan-gating story across
training-your-chatbotandretraining-and-updating-your-ai-chatbot. - Publish a v0 → v2 migration guide and document v2 rate limits. The recommended API currently has no published throttling contract and no upgrade path from the legacy version, which blocks serious adoption.
- Auto-generate the changelog index and fix
auth.mdescaping. Both are mechanical bugs that quietly destroy trust — the changelog hides three months of shipped work, and the agent-auth doc serves corrupted field names to the AI agents it was written for.