Ollama Documentation Audit
The docs are well-scaffolded (Mintlify, llms.txt, OpenAPI spec, dedicated capability pages), but the substance underneath is uneven: the CLI reference is effectively one heading, most endpoint pages emit only a one-line stub into the agent-facing index, integration pages recommend models that don't appear to exist in the public library, two URL prefixes coexist for the same API surface, and a Python example contains JavaScript syntax.
1. Two URL prefixes for the same API reference — /api/* vs /api-reference/* (critical)
Location: docs.ollama.com sidebar nav on /api/chat; llms-full.txt indexed sources
Problem: The sidebar on /api/chat lists endpoints under /api/ (Generate, Chat, Embed, Tags, Ps, Create, Copy, Pull, Push, Delete) but inconsistently routes "Show model details" to /api-reference/show-model-details and "Get version" to /api-reference/get-version. Direct probes confirm: GET /api/version → 404, GET /api-reference/get-version → 200; GET /api/show → 404, GET /api-reference/show-model-details → 200; GET /api/copy → 200, GET /api-reference/copy → 404. The llms-full.txt index mixes both prefixes as canonical Source URLs.
Consequence: Anyone constructing a URL by analogy (humans guessing /api/version or agents indexing Source: lines from llms-full.txt) hits 404s on the two off-pattern routes. Coding agents trained on the llms.txt index will guess wrong predictably on these specific endpoints.
The fix: Pick one prefix, redirect the other. Most likely: redirect /api-reference/get-version → /api/version and /api-reference/show-model-details → /api/show, then rewrite the sidebar and Source headers in the corresponding .mdx files so llms-full.txt emits a single canonical URL per endpoint.
2. 11 endpoint pages emit a one-line stub into llms-full.txt (critical)
Location: llms-full.txt entries for /api/generate, /api/embed, /api/copy, /api/create, /api/delete, /api/ps, /api/pull, /api/push, /api/tags, /api-reference/get-version, /api-reference/show-model-details
Problem: Each of these pages, as emitted in llms-full.txt, contains only:
Source: https://docs.ollama.com/api/<name>
/openapi.yaml <method> /api/<name>
[optional one-sentence summary]
/api/generate (the flagship completion endpoint) is one line: "Generates a response for the provided prompt". By contrast, the live rendered page for /api/chat does ship a request schema, a response schema, and a curl example — but that content does not surface into the llms.txt index, where /api/chat also appears as a stub. So there are two distinct problems: (a) for the 11 endpoints above, no per-endpoint examples exist for an agent to lift; (b) for /api/chat, the rendered page is fine but the agent-facing index strips it. The live rendered /api/generate page was not probed in this audit, so it is possible the live HTML carries more detail than llms-full.txt — but that itself would be the bug.
Consequence: Agents extracting examples from llms.txt won't find any per-endpoint example for these 11 endpoints — they have to fall back to assembling a request from /openapi.yaml, /api/streaming, /api/usage, and /api/errors separately. A developer who lands on /api/generate from a search engine sees a near-empty page and has to do the same dig.
The fix: Either (a) extend each endpoint page so it ships, like /api/chat does, with one full request example, one full response example, a streaming-behavior pointer if applicable, and a link to the relevant error codes — and make sure that body content gets emitted into llms-full.txt; or (b) confirm whether /api/chat's richer body is missing from llms-full.txt because of an extraction quirk and fix the extractor. Don't ship endpoint pages whose entire agent-visible content is a Source URL.
3. The CLI reference doesn't reference the CLI (critical)
Location: /cli
Problem: The entire /cli page in llms-full.txt is:
# CLI Reference
Source: https://docs.ollama.com/cli
### Run a model
There is no command list, no flag reference, no examples for ollama list, ollama show, ollama serve, ollama --version, ollama help, ollama pull, ollama rm, etc. — all of which are documented in the GitHub README and used in examples elsewhere on the docs (e.g., the Modelfile page references ollama show --modelfile <model> with no dedicated CLI page entry). The page also carries a hand-curated "Supported integrations" list of 5 items (see #7) but no actual command surface.
Consequence: A developer looking up CLI flags lands on an effectively empty reference and has to leave docs.ollama.com to find the README. Agents that respect doc boundaries will conclude the CLI is undocumented and either hallucinate flags or refuse to suggest commands.
The fix: Generate the CLI reference from the binary's --help output (or from the README's CLI section), grouped by command, with one-line descriptions and a representative example per command. Treat /cli as a primary reference page on par with /api/chat, not a stub.
4. Recommended-model lists across integration pages reference models not in the public library (critical)
Location: /integrations/claude-code, /integrations/copilot-cli, /integrations/openclaw, /integrations/hermes, /integrations/pool, /integrations/nemoclaw, /api/anthropic-compatibility
Problem: These pages confidently list "Recommended Models" like:
kimi-k2.5:cloud,kimi-k2.6:cloudglm-5:cloud,glm-5.1:cloud,glm-4.7,glm-4.7-flashminimax-m2.7:cloud,minimax-m2.1qwen3.5,qwen3.5:cloud,qwen3.5:27b,qwen3.6gemma4nemotron-3-super:cloud,nemotron-3-nano:30b
Spot-checks against the public library at ollama.com/library show the released names are qwen3, qwen3-coder, gemma3, gpt-oss, deepseek-r1, embeddinggemma, etc. — the .5/.6/-flash/:cloud-suffixed versions named in the integration docs are not currently visible. The Claude Code page narrative says "models such as qwen3.5, glm-5:cloud, kimi-k2.5:cloud" in the same paragraph that the recommended list also names qwen3.5:cloud and qwen3.5 (no :cloud).
Consequence: Copy-paste from the integration pages produces ollama pull / ollama run commands that fail with model-not-found. A developer following the Claude Code guide will set up the Anthropic-compatible env vars, then hit a wall when they try to actually run any of the listed models. Coding agents reading these pages as authoritative will hallucinate model coordinates the runtime can't resolve.
The fix: Drive the "Recommended Models" lists from the model library (the same source-of-truth as ollama.com/library) instead of hand-authoring per integration. At minimum, run a CI check that every model coordinate mentioned under /integrations/** resolves to a real entry in the library before publishing.
5. Python tool-calling example contains JavaScript object syntax (significant)
Location: /capabilities/tool-calling, "Handling streamed chunks" section
Problem: The Python streaming snippet ends with:
new_messages = [{ role: 'assistant', thinking: thinking, content: content }]
That's not valid Python — role, thinking, and content are bare identifiers (would raise NameError) instead of string keys. The correct form is {"role": "assistant", "thinking": thinking, "content": content}. This looks like the JavaScript example was duplicated into the Python tab without translation.
Consequence: A developer copy-pasting the Python tool-calling example gets a NameError: name 'role' is not defined at runtime. Agents extracting this snippet for a Python codebase will produce broken code without warning.
The fix: Quote the dict keys: {"role": "assistant", "thinking": thinking, "content": content}. Audit the rest of the tool-calling Python tab for the same JS-leak (anywhere there's { key: value } without quotes).
6. Context-length defaults contradict between /context-length and /faq (significant)
Location: /context-length and /faq
Problem: /context-length documents VRAM-tiered defaults:
< 24 GiB VRAM: 4k context 24-48 GiB VRAM: 32k context
= 48 GiB VRAM: 256k context
/faq says: "By default, Ollama uses a context window size of 4096 tokens. This can be overridden with the OLLAMA_CONTEXT_LENGTH environment variable."
Neither page acknowledges the other. The FAQ implies a single 4k default everywhere; the context-length page implies the default scales with VRAM up to 256k. There is no explanation of how OLLAMA_CONTEXT_LENGTH interacts with the VRAM tiering, or which one wins.
Consequence: A developer on a 48 GiB+ machine reading /faq will think they are getting 4k unless they set the env var, and silently leave a lot of context on the table — or, conversely, assume 256k is in effect when their workload was actually capped at 4k by some other code path. The 64-fold gap between 4k and 256k makes silent misalignment expensive.
The fix: Pick one canonical statement. If the VRAM tiering is correct, rewrite the FAQ to describe the tiers and clarify how OLLAMA_CONTEXT_LENGTH overrides them; if the FAQ is correct (single 4k default), remove the tiered table from /context-length or relabel it as "recommended" rather than "defaults".
7. CLI Reference's "Supported integrations" list is out of sync with the integrations index (significant)
Location: /cli ("Supported integrations" list) vs /integrations/index
Problem: /cli lists 5 supported integrations: OpenCode, Claude Code, Codex, VS Code, Droid. The /integrations/index page documents 17 (Coding Agents: Claude Code, Codex, Copilot CLI, OpenCode, Droid, Goose, Pi, Pool; Assistants: OpenClaw, Hermes Agent; IDEs: VS Code, Cline, Roo Code, JetBrains, Xcode, Zed; plus Onyx, n8n, marimo). Across pages, ollama launch <name> examples exist for at least 11 integrations (claude, codex, copilot, opencode, droid, goose, pi, pool, openclaw, hermes, vscode), so the CLI page is missing 6 launchable integrations. The OpenClaw launch alias clawdbot is mentioned in passing but not surfaced as an alias on either /cli or /integrations/openclaw.
Consequence: A developer reading the CLI reference believes only 5 integrations work with ollama launch. They won't try ollama launch goose or ollama launch openclaw even though those are documented elsewhere. Agents grepping the CLI page for capability discovery will under-report what launch supports.
The fix: Either generate the CLI page's "Supported integrations" list from the same source as /integrations/index, or drop the list from /cli and link to the index. Don't maintain two hand-curated lists. While you're there, document the openclaw/clawdbot alias mapping in one canonical place.
8. OpenAI-compat page disclaims logprobs while the chat endpoint advertises them (significant)
Location: /api/openai-compatibility ("Supported features" for /v1/chat/completions) vs /api/chat body schema
Problem: The OpenAI-compat page lists [ ] Logprobs (unchecked, i.e. unsupported) for /v1/chat/completions. The native /api/chat body schema, however, advertises logprobs (boolean) and top_logprobs (integer) as accepted parameters and includes a logprobs[] field in the response. The two pages don't reconcile whether logprobs are unsupported across the board, supported only on the native endpoint, or supported only via OpenAI compat with caveats.
Consequence: Developers building on the OpenAI SDK assume logprobs aren't available, while developers calling /api/chat directly assume they are — and one of those groups is wrong, but the docs don't say which. Agents emitting code through whichever entrypoint is convenient will silently ship requests with logprobs: true to an endpoint that may or may not honor it.
The fix: State plainly: "logprobs is supported on /api/chat natively but not on the /v1/chat/completions OpenAI-compatible endpoint" (or whichever direction is true). Cross-link the two pages so the constraint is visible from both sides.
9. Authentication page never closes the loop on bearer-token usage (significant)
Location: /api/authentication
Problem: The page tells you authentication is required for cloud models, publishing, and private model downloads, and that there are two methods (sign-in and API keys). The OpenAPI spec defines a bearerAuth security scheme (http/bearer/API Key). But the prose page in the scraped content stops at "To sign in to ollama.com from your local installation of Ollama, run:" and never shows: how to send the API key to ollama.com endpoints (header name, format), how to send it to the web-search API (POST https://ollama.com/api/web_search), whether keys expire, whether they can be scoped, or how to rotate them. The web-search page says "create an API key" but also doesn't show the header.
Consequence: A developer trying to call the cloud API or web search programmatically has to guess Authorization: Bearer <key> from the OpenAPI security scheme. Agents producing client code from prose alone will either hallucinate a header or fail to add one. Key lifecycle (expiry, scope, revocation) is undocumented entirely.
The fix: Add a complete "Using API keys" subsection to /api/authentication showing the exact Authorization: Bearer … header, an end-to-end curl example against an authenticated endpoint, and a paragraph on key lifecycle (expiration, revocation, scope). Cross-link from /capabilities/web-search and the /cloud page.
10. Windows installer page mislabels the MLX archive as CUDA (significant)
Location: /windows, "Standalone CLI" section
Problem: The page lists optional archives and labels:
- AMD GPU:
ollama-windows-amd64-rocm.zip- MLX (CUDA):
ollama-windows-amd64-mlx.zip
MLX is Apple's array framework, not a CUDA backend. Pairing "MLX" with "(CUDA)" on the Windows installer page is internally contradictory — MLX doesn't run on Windows, and CUDA isn't MLX. The base ollama-windows-amd64.zip is described as containing "GPU library dependencies for Nvidia," so the supplemental CUDA path on Windows is otherwise unspecified.
Consequence: A developer who actually wants Nvidia/CUDA support on Windows will download ollama-windows-amd64-mlx.zip thinking the parenthetical "(CUDA)" applies, then hit obscure load errors with no fallback path documented on the page. This is a wrong-file-download bug, not a label nit.
The fix: If ollama-windows-amd64-mlx.zip is meant for CUDA, rename the package; if it's misfiled here, remove the row. Either way, "MLX (CUDA)" cannot stand as a label. State explicitly which archive contains the CUDA runtime (or confirm it's bundled in the base zip) so Nvidia users have an unambiguous path.
11. Modelfile parameter table has missing descriptions (significant)
Location: /modelfile, "Valid Parameters and Values" table
Problem: The parameter table on the Modelfile reference page is the canonical list of generation parameters, but several rows have no description column:
top_k | (Default: 40) | int— no descriptiontop_p | (Default: 0.9) | float— no descriptionmin_p | Alternative to the top_p, ... | float— description is truncated mid-sentence
Surrounding rows (temperature, repeat_penalty, num_ctx, num_predict, seed, stop, repeat_last_n) all carry a complete description. The gaps look like row authoring errors, not intentional omissions.
Consequence: Developers tuning generation behavior have no in-docs explanation for the three most common sampling knobs. Agents reading this table to advise on sampling settings see "(Default: 40)" with no semantics — so they fall back to general LLM lore that may not match Ollama's implementation.
The fix: Fill in the missing descriptions (one sentence each) and finish the truncated min_p line. Add a CI lint that rejects parameter-table rows where the description column is empty.
12. /api/introduction is itself a stub (significant)
Location: /api/introduction
Problem: The page that the sidebar treats as the canonical API entry point reads, in full:
Ollama's API allows you to run and interact with models programatically.
Get started
If you're just getting started, follow the quickstart documentation to get up and running with Ollama's API.
Base URL
After installation, Ollama's API is served by default at:
There's no listing of endpoints, no mention of authentication, no link to the OpenAPI spec, no mention of streaming format, and no overview of which endpoints are local vs cloud. The "programatically" typo is a tell that the page hasn't been edited recently. The page is also where the off-site community-libraries link lives, which means the only on-domain pointer to community SDKs sits inside an otherwise-empty intro.
Consequence: A developer arriving from /api or the homepage Card has no map of the API surface. Agents indexing the introduction for context summarization get a single sentence of signal.
The fix: Rewrite /api/introduction to be a real overview: base URL, authentication pointer, list of endpoint groups (chat, generate, embed, model management, version), streaming format pointer, error pointer, and a link to the OpenAPI spec. Fix the typo while you're there.
13. .mdx links in the FAQ produce soft 404s (minor)
Location: /faq page rendered HTML
Problem: The FAQ page contains links to ./troubleshooting.mdx and ./gpu.mdx. Probing those URLs returns the docs chrome (sidebar/header/search) with no main content body — a soft 404. The published Mintlify routes are /troubleshooting and /gpu (no extension), which 200 normally. The author wrote the link against the source filename instead of the routed URL. Same family of bug appears in the GPU page's Vulkan note ([FAQ](faq#how-do-i-configure-ollama-server) — no leading slash, fragile to deeper URLs), though that one wasn't probed live.
Consequence: Developers clicking "see troubleshooting" or "see gpu" from the FAQ land on a blank-looking page with no error indication and may conclude the section doesn't exist. The page renders as a normal docs shell so it doesn't trigger a 404 page or fallback search.
The fix: Strip the .mdx extension from in-page links. Add a CI lint rule that rejects internal .mdx link targets and bare relative paths in MDX source.
14. NemoClaw integration page is orphaned (minor)
Location: /integrations/nemoclaw exists in llms.txt; not referenced from /integrations/index
Problem: The NemoClaw integration page is published and indexed by llms.txt, but the integrations overview at /integrations/index has no link to it under any of its sections (Coding Agents, Assistants, IDEs, Chat & RAG, Automation, Notebooks). The only way to discover the page is via search or the llms.txt index.
Consequence: A developer browsing the integrations index will not find NemoClaw. Search-driven discovery still works, but the page is effectively hidden from human navigation.
The fix: Add NemoClaw under "Assistants" (or wherever NVIDIA's stack belongs) on /integrations/index. Lint against pages that exist on disk but aren't reachable from the index.
15. Quantization list is implausibly short (minor)
Location: /import, "Supported Quantizations" section
Problem: The page lists exactly three supported quantizations:
q8_0- (K-means)
q4_K_S,q4_K_M
This is conspicuously narrow for a runtime that's commonly used with q4_0, q5_K_M, q6_K, and similar GGUF quantizations referenced throughout the broader Ollama ecosystem. Either the runtime accepts more than this list and the docs are incomplete, or the runtime really is restricted to three levels and the page should say so explicitly with a rationale.
Consequence: Developers running ollama create -q <level> against a level not on this list get a runtime error with no in-docs justification. Agents recommending quantization levels will either over-recommend (suggesting GGUF levels Ollama may not accept) or under-recommend (sticking only to the three listed).
The fix: Either expand the list to match what ollama create -q actually accepts (verify against the binary), or add a one-line note explaining that the listed levels are the only ones supported by the -q flag and link to import-as-GGUF for everything else.
16. No /api landing page despite the homepage card linking there (minor)
Location: /index homepage card; sidebar nav
Problem: The homepage Card component links "API reference" to href="/api", but the actual API introduction lives at /api/introduction (which is itself a stub — see #12). There is no canonical /api landing page documented in llms.txt, and visiting /api directly does not resolve to a content page.
Consequence: The homepage's own primary CTA for the API reference is a soft-broken link (or a redirect that masks the inconsistency). Anyone who manually types docs.ollama.com/api expecting a hub gets nothing useful.
The fix: Either make /api the canonical introduction page (and redirect /api/introduction to it) or change the homepage card href to /api/introduction. Pick one and align the sidebar.
17. AMD Windows/Linux GPU support asymmetry is unexplained (minor)
Location: /gpu, AMD Radeon section
Problem: The Linux ROCm support table includes recent AMD silicon (RX 9070 XT, RX 9060 XT, Ryzen AI Max+ 395, MI350X) while the Windows table tops out at the RX 7900 XTX with no RX 9000-series entries. There's no prose explaining why — no "Windows ROCm v6.1 doesn't yet ship driver support for RDNA4" note, no pointer to Vulkan as the workaround, no ETA. The Vulkan section exists separately but doesn't cross-reference the Windows AMD gap.
Consequence: A developer with a recent AMD card on Windows reads the support table, finds their card missing, and has no signposted fallback. They don't know whether to wait for ROCm, switch to Vulkan (currently experimental), or assume their card simply isn't supported.
The fix: Add a one-paragraph note above the Windows AMD table explaining the version gap (Windows uses ROCm v6.1, Linux uses v7) and pointing users with newer AMD cards to the Vulkan section as the experimental fallback. Cross-link both directions.
18. No on-site community SDK page — the only list is on a GitHub README anchor (minor)
Location: /api/introduction external link to github.com/ollama/ollama#libraries-1
Problem: The introduction page sends developers off-site to the GitHub README's #libraries-1 anchor for community-maintained SDKs. There is no page under docs.ollama.com that enumerates community libraries (Go, Rust, .NET, Java, etc.). Combined with the llms.txt-stub problem in #2, this means agents respecting the docs domain see neither per-endpoint examples nor a community SDK landscape — both pieces of "how to actually call this API" sit outside the indexable surface.
Consequence: A developer searching the docs for a "Go client" or "Rust client" finds nothing on-site, then has to leave the docs entirely. Agents that respect doc boundaries (only fetch from docs.ollama.com) will conclude no client exists in their language.
The fix: Add an /sdks or /libraries page that enumerates official + community libraries with links and language tags, keep it as the single source of truth, and have the README link back to it instead of vice-versa. This also closes the offline-mirror gap.
What they do well
- Real machine-readable surface: a published
llms.txt,llms-full.txt, and an OpenAPI 3.1 spec at/openapi.yaml(covers more than most peers). - Capability pages (
/capabilities/structured-outputs,/capabilities/thinking,/capabilities/tool-calling,/capabilities/web-search) are written as task-oriented narrative with multi-language tabs — exactly the format agents and developers can lift from. - Hardware support page (
/gpu) is unusually concrete on Nvidia: explicit compute-capability tables, GPU-selection env vars, and a known-issue workaround for the Linux suspend/resume UVM bug.
Top 3 recommendations
- Fill in the empty reference pages.
/cliis one heading,/api/introductionis one sentence, and 11 endpoint pages emit a one-line stub into llms-full.txt. These are the pages a developer (or agent) lands on first — every one of them should ship a real surface (commands + flags, endpoint groups, request/response examples) instead of an auto-rendered OpenAPI form or a "Source:" line. - Drive integration "Recommended Models" lists from the actual model library. A CI check that every model coordinate mentioned in
/integrations/**and/api/anthropic-compatibilityresolves to a published library entry would catch the qwen3.5/glm-5/kimi-k2.5/minimax-m2.7/gemma4/nemotron-3 references in one pass. - Unify URL schemes and close the auth/discovery loops. Pick
/api/*or/api-reference/*and redirect the other; document the bearer-token header, key lifecycle, and an end-to-end authenticated example in/api/authentication; add a/sdkspage so community libraries live inside the docs; fix the.mdxlink leakage and the orphaned NemoClaw page so navigation matches reality.