Trainly Documentation Audit
The docs describe a confident observability platform on the surface, but the SDK, REST API, marketing site, and concepts page contradict each other on nearly every shared primitive — names, parameters, hostnames, and even the product's feature set.
1. Two different dashboard hosts and two different API hosts, with no explanation (critical)
Location: /quickstart, /authentication-guide, /react-sdk, /api-reference/introduction
Problem: The docs simultaneously direct developers to two unrelated domains. Quickstart Steps 2 and 4 send users to https://trainlyai.com/dashboard, while the Authentication guide calls the "Trainly Dashboard" https://app.trainly.dev. The React SDK uses baseUrl="https://api.trainly.dev" in its example, but the official API reference lists the base URL as https://api.trainlyai.com. No page explains the relationship between the trainlyai.com and trainly.dev domains, which is the same is canonical, or whether one is deprecated.
Consequence: A developer cannot reliably know which host to point credentials at. API keys generated on one dashboard may not work against the other base URL, traces may silently go to the wrong tenant, and CORS/CSP allowlists set from copy-pasted examples will be wrong.
The fix: Pick one canonical host per surface (dashboard + API) and replace every reference. If both genuinely exist (e.g., legacy vs. new), publish a "Domains" page that maps them explicitly and mark one as deprecated with a removal date.
2. Three different shapes for the same "rollback a version" call (critical)
Location: /concepts, /python-sdk, /api-reference/versions-api
Problem: The same operation is documented three incompatible ways on three pages of the same site:
- Concepts:
client.versions.rollback(version_id="ver_...") - Python SDK:
versions.rollback(version="v2.0.0")(a semver string under a different kwarg) - REST API:
POST .../{version_id}where{version_id}isver_f5g6h7i8
Publish has the same problem: REST takes description; the Python SDK passes metadata={"changelog": ...}.
Consequence: Whichever page the developer opens first, the other two will produce a runtime error — either a TypeError on the kwarg, an HTTP 400 on the path param, or a silent no-op if the SDK swallows it. Version rollback is exactly the operation you reach for in an incident; getting a stack trace there is the worst possible time.
The fix: Pick one canonical kwarg (version_id accepting the ver_ ID) and one canonical publish field (description). Update Concepts, Python SDK, and REST to match, and add an explicit "By ID, not semver" note since the prefix convention is easy to get wrong.
3. "Gate" is a primary product pillar on the marketing site and a ghost in the docs (critical)
Location: trainlyai.com (marketing) vs. all of docs.trainlyai.com
Problem: The marketing homepage advertises three pillars — "Trace / Score / Gate — Auto-retry failed AI steps with the failure context" — and its hero code sample uses @observe(project="research-agent", gate=True). The gate=True parameter and the entire "Gate" auto-retry behavior do not appear anywhere in the docs. The @observe parameter table in /python-sdk lists model, tags, expected_output, trace_id, metadata, version, custom_attributes, span_name, capture_exceptions, session_id — but not gate, and not project.
Consequence: Developers who arrive from the marketing site and copy the hero snippet will either hit an unknown-kwarg error or, worse, enable an auto-retry behavior they cannot configure, observe, or disable because it has zero documentation. The headline feature is undocumented.
The fix: Either ship a /gate (or /auto-retry) docs page describing the parameter, semantics, failure modes, retry budget, and idempotency expectations — or remove gate=True from the marketing example until it ships in the docs.
4. Marketing-site SDK surface does not match the documented SDK (critical)
Location: trainlyai.com hero sample vs. /python-sdk
Problem: The marketing code does from trainly import observe and uses @observe(project="research-agent", gate=True). The Python SDK page documents @observe with no project argument at all — projects are implied by the API key. Neither project= nor gate= is in the parameter table.
Consequence: The "five lines to get started" promise breaks the moment a developer copies the homepage example into a real project. They'll get a TypeError and have to reverse-engineer the real signature from the docs.
The fix: Make the marketing example call the documented signature verbatim, or extend the documented signature to actually accept project= and gate=. The homepage is the most-copied snippet you have — it must compile against the shipped SDK.
5. Token-usage shape differs between Python and React SDKs (significant)
Location: /python-sdk, /react-sdk
Problem: Python log() documents token usage as {prompt_tokens, completion_tokens, total_tokens}. The React SDK's setTokenUsage takes {prompt, completion, total}. Same field, two field names, no migration table.
Consequence: Teams running an isomorphic stack (Next.js + Python backend) will end up with inconsistent token attribution — one half logs to prompt_tokens, the other to prompt, and aggregate cost dashboards will under-report unless every consumer special-cases both shapes.
The fix: Pick one naming (prompt_tokens is the OpenAI convention and the more obvious choice) and align both SDKs. If you can't break the React shape, document the mapping explicitly on both pages.
6. Span kind enum contradicts itself across pages (significant)
Location: /concepts, /python-sdk
Problem: Concepts lists the allowed span kinds as chain, retrieval, tool, agent, llm, embedding. The Python SDK example uses kind="retriever" (singular noun form, not the verbal form in Concepts).
Consequence: Whichever value the developer picks, one page tells them they're wrong. If the server validates strictly, half of all retrieval spans will be silently dropped or 400'd; if it accepts both, dashboards that filter by kind="retrieval" will miss spans tagged retriever and vice versa.
The fix: Pick one spelling (recommend retrieval to match OpenTelemetry semconv) and update both pages. Add the canonical enum to a single source-of-truth table referenced from both Concepts and the SDK reference.
7. Test-suite field names disagree between REST and Python SDK (significant)
Location: /api-reference/testing-api, /python-sdk
Problem: The REST Testing API uses query and expected_answer for a test case. The Python SDK uses input= and expected_output= for the same concept (and expected_output is also the Python @observe parameter name for ground truth). So the same record has two different field names depending on which surface you touch.
Consequence: Round-tripping a test case between the SDK and the REST API requires manual remapping. Worse, expected_output (SDK) vs. expected_answer (REST) and query (REST) vs. input (SDK) is the kind of cross-language drift agents will quietly ignore and write broken adapters around.
The fix: Align the REST payload to input and expected_output. The SDK names match the rest of the platform's vocabulary (expected_output is already the @observe kwarg).
8. Analytics endpoints have no documented filter parameters (significant)
Location: /api-reference/analytics-api
Problem: The Metrics, Costs, and Performance endpoints have no documented query parameters, yet the example responses contain period_start: "2026-04-01" and period_end: "2026-04-07". The Python SDK calls get_metrics_summary(start_date="2026-04-01", end_date="2026-04-07") — proving the parameters exist server-side but aren't documented in the REST reference.
Consequence: Anyone integrating analytics over raw HTTP (BI tools, internal dashboards, non-Python services) has no way to know what window they're querying or how to choose one. They'll either always get a default window or have to guess parameter names from the SDK source.
The fix: Document start_date, end_date, and any other filters (tags, project_id, model) for every analytics endpoint, including formats, defaults, and max window size.
9. Traces resource split between /traces (write) and /analytics/traces (read) (significant)
Location: /api-reference/traces-api
Problem: "Log a Trace" lives at POST /traces, but "List Traces" lives at GET /analytics/traces. There is no DELETE or PATCH /traces/{id} documented anywhere, so there is no documented way to remove or correct a trace once logged.
Consequence: REST clients expect the conventional /traces collection to support both read and write. Splitting them across two prefixes breaks SDK code generators and confuses anyone scanning the API surface. The missing delete path also has compliance implications — if a user sends PII to a trace by mistake, there's no documented way to remove it.
The fix: Move list/get under /traces (keep /analytics/traces as a deprecated alias if you must). Document a DELETE /traces/{id} or POST /traces/{id}/redact endpoint, even if it's just for compliance-driven deletes.
10. Error reference is anaemic for a platform that ingests production traces (significant)
Location: /api-reference/introduction
Problem: Only six 4xx/5xx codes are documented. Retry-After is mentioned but no value or guidance is given. The error body schema has no request_id and no details field. There's no documentation for 502 / 503 / 504 transient classes, which are the codes a trace-ingestion client will see most often during incidents.
Consequence: When ingestion fails — and it will, since the whole point of the product is to be in the hot path — clients have nothing to log against support, no documented retry semantics, and no documented way to distinguish "retry this" from "drop this." Support tickets become unactionable.
The fix: Publish the full error code matrix, document Retry-After semantics (header value, default backoff, max retries), add request_id to every error response, and dedicate a section to transient 5xx behavior with sample retry pseudocode.
11. Scoring API has no enumerated list of built-in scorer slugs (significant)
Location: /api-reference/scoring-api
Problem: Examples only show correctness, faithfulness, toxicity, helpfulness. No page enumerates the complete list of built-in scorer_slug values. The score-with-judge endpoint accepts an optional trace_id but never explains what happens to a judge result that isn't linked to a trace — does it persist? Is it retrievable? Does the call become fire-and-forget?
Consequence: Developers cannot know what scorers exist without trial and error against the API. The optional trace_id ambiguity means LLM-judge calls have undefined data-retention semantics — exactly the question a security/compliance reviewer will ask first.
The fix: Publish a "Built-in scorers" reference page listing every slug, what it measures, and its scale. For score-with-judge, document both branches: trace_id present → attached and queryable; absent → ephemeral / persisted-under-X.
12. GitHub link from llms.txt and the Community footer points to an empty org (significant)
Location: /llms.txt, footer links
Problem: llms.txt lists https://github.com/trainly under "Optional." That org exists but has zero public repositories: "This organization has no public repositories." No alternative GitHub link (e.g., trainly-ai) is mentioned anywhere in the docs.
Consequence: Agents indexing the docs will follow the GitHub link to learn about SDK source, examples, and issues, and will hit a dead end. Developers looking for the SDK source, sample apps, or an issue tracker have nowhere to go from the docs.
The fix: Point to the actual GitHub org (or a trainly-ai namespace), or remove the link until repositories are public. If the SDKs are closed-source, say so on the SDK pages and link to a private issue tracker.
13. Four "pre-built" React components are listed but have no reference docs (significant)
Location: /react-sdk
Problem: The React SDK page lists four pre-built components — TrainlyChat, TrainlyUpload, TrainlyStatus, TrainlyFileManager — but no reference page exists for any of them. llms.txt confirms only 12 docs pages total and none cover these components.
Consequence: Developers cannot import these components without reading source (which isn't on the GitHub org — see #12). Props, accessibility behavior, styling hooks, and data flow are all undocumented. Agents auto-completing JSX have no schema to draw from.
The fix: Ship a reference page per component with props tables, examples, and any required TrainlyProvider context, or remove the list until those pages exist.
14. No changelog, no versioning, no status page (significant)
Location: /changelog, /pricing, /faq, /status — all 404
Problem: /changelog, /pricing, /faq, and /status all return 404. No page documents the current trainly (Python) or @trainly/react SDK version number. There is no migration guide and no security/compliance/data-residency page.
Consequence: Customers can't pin SDK versions (no version number is visible), can't see what changed between releases, can't tell if a current outage is on Trainly's side, and can't answer procurement questions about compliance. For an observability product sold into production AI systems, the absence of a status page is conspicuous.
The fix: Publish a /changelog per SDK and per API surface, a /status (or link to an external status host), and a security/data-handling page. Show the current SDK version on the Python SDK and React SDK pages.
15. No documented way for an agent to discover the API programmatically (significant)
Location: /llms.txt, /api-reference/*
Problem: llms.txt exists, which is good, but there is no machine-readable OpenAPI/Swagger spec linked from any page. Every endpoint is prose with hand-written examples. Combined with #2, #5, #6, and #7 (Python vs. React vs. REST disagreement), an AI coding agent has no canonical source of truth to fall back on.
Consequence: Agents (Claude Code, Cursor, Copilot) cannot reliably generate correct calls — they will average the contradictions across pages and produce hybrid signatures that compile against none of the surfaces. Human developers can use judgment; agents fail silently.
The fix: Publish an OpenAPI 3.1 spec at a stable URL (e.g., /api-reference/openapi.json), reference it from llms.txt, and generate the prose reference from it so future drift is impossible.
16. Session context manager yields inconsistent objects across pages (minor)
Location: /concepts, /python-sdk
Problem: Concepts shows agent_session() as session_id (yields the ID string directly). The Python SDK shows agent_session() as session: (yields a context object). These cannot both be true unless the object stringifies to its ID, which the docs don't say.
Consequence: Code copied from Concepts that does print(f"session: {session_id}") may print <Session object at 0x...> instead of the ID, and code that does session.add_event(...) from the SDK page will AttributeError if Concepts' shape is canonical.
The fix: Pick one (a context object with an .id attribute is the more useful shape) and update the other page. Add an explicit "what does the context manager yield" line.
17. Trace ID prefix convention inconsistent in examples (minor)
Location: /concepts, /python-sdk
Problem: Concepts shows trace IDs as tr_.... The Python SDK example uses trace_id="trace_abc123". If the platform validates the prefix, one of these is wrong; if it doesn't, the inconsistency confuses developers writing their own ID generators.
Consequence: Custom trace_id values copied from the SDK page may be rejected if the server enforces a tr_ prefix, or may be accepted but unsearchable if dashboards filter on the prefix.
The fix: Document the canonical prefix (tr_) and either reject non-prefixed IDs server-side with a clear error, or accept them and say so.
What they do well
llms.txtexists and enumerates the doc pages — many competitors don't have one at all.- The five-primitive model (traces, spans, sessions, scores, versions) on the Concepts page is a sensible mental model, even where the parameter names drift downstream.
- The
@observedecorator is genuinely a low-friction entry point, and the Quickstart sticks to the promised five lines.
Top 3 recommendations
- Pick one canonical hostname per surface (dashboard + API) and one canonical signature per operation (rollback, publish, token usage, test cases). The platform currently ships three product personalities —
trainly.dev,trainlyai.com, and the marketing pillar set — and every contradiction in this audit roots back to that. - Publish an OpenAPI spec and align the SDKs to it. Generate the prose API reference from the spec. This eliminates the REST↔SDK drift documented in findings 2, 7, 8, and 15 and gives agents a real source of truth.
- Document "Gate" or remove it from the marketing example. A headline pillar that doesn't appear in any docs page is the single most jarring inconsistency on the site, and the homepage code snippet is the first thing every new developer will copy.