mindXdashboard/docs/book/journal/api/dojo/inference/governance/origin

philosophymanifesto thesis origin whitepaper ataraxia roadmap press|archoverview orchestration codebase hierarchy core|agentsmindXagent ceo mastermind bdi evolution author all

govdaio civilization identity security|memorypgvector embed aglm memory|inferencevllm ollama mistral gemini|timeoracle

toolsindex tools a2a mcp shell|publishauthoragent book journal|deployproduction security monitoring|apireference swagger|learnusage guide hackathon

ollama/INDEX.md · 30.6 KB

Ollama Complete Reference — Local Documentation for mindX

Self-contained reference for all Ollama capabilities.

No external docs needed — resilient offline operation.

Source: docs.ollama.com (fetched 2026-04-11) + mindX integration specifics.

Back to mindX Documentation Hub

Operational Standards

mindX operates from two inference pillars — both are operational standards, not fallbacks:

Pillar	Source	Speed	Model Scale	Availability	Cost
CPU inference	`localhost:11434`	~8 tok/s	0.6B–1.7B	Always (no network)	Zero
Cloud inference	`ollama.com` via `OllamaCloudTool`	~65 tok/s	3B–1T	24/7/365 (free tier)	Zero

CPU provides autonomy — mindX reasons even offline, even when every API key is exhausted. Cloud provides scale — 120B+ parameter models on NVIDIA GPUs, 8.2x faster than local CPU. Together they form the resilience guarantee: mindX never stops inferring.

The 5-step resolution chain in _resolve_inference_model() tries the best available source first and walks down to guarantee. CPU is the failsafe. Cloud is the guarantee. Both are always ready.

Quick Navigation

API Reference

Endpoint	Local	Cloud	Doc
`POST /api/generate`	`localhost:11434`	`ollama.com`	generate.md
`POST /api/chat`	`localhost:11434`	`ollama.com`	chat.md
`POST /api/embed`	`localhost:11434`	`ollama.com`	embeddings.md
Model management	`localhost:11434`	—	models.md
`GET /api/ps`, `/api/version`	`localhost:11434`	—	running.md

All endpoints documented with every parameter, response field, and curl/Python/JavaScript examples. See the Ollama OpenAPI spec for the authoritative schema.

Features

Each feature doc includes curl, Python SDK, JavaScript SDK, and mindX-specific code examples. All features work identically on both CPU and Cloud pillars.

Streaming — Real-time token-by-token output via /api/chat and /api/generate; extends OllamaAPI which currently uses stream=False
Thinking — Chain-of-thought reasoning with the think parameter; supported models include DeepSeek R1 (local) and GPT-OSS (cloud, levels: "low"/"medium"/"high")
Structured Outputs — JSON schema-constrained generation via the format parameter; works with Pydantic and Zod; used by BDI reasoning for structured state extraction
Vision — Image understanding with multimodal models; cloud models gemma4, kimi-k2.5 support vision
Embeddings — Vector embeddings for RAGE semantic search and pgvector storage; mindX uses mxbai-embed-large and nomic-embed-text
Tool Calling — Function calling / tool use; single, parallel, and agent loop patterns; bridges to mindX BaseTool via OllamaCloudTool
Web Search — Grounded generation via Ollama web search API; requires OLLAMA_API_KEY; available as OllamaCloudTool.execute(operation="web_search")

Cloud & Infrastructure

Ollama Cloud — Free/Pro/Max tiers, API keys, cloud models, local offload vs direct API; the cloud operational pillar
Cloud Model Search — Programmatic discovery via /api/tags and OllamaCloudModelDiscovery class; feeds into Modelfile schema and Chimaiera alignment; the -cloud suffix distinction
Cloud Rate Limiting — CloudRateLimiter embedded in OllamaCloudTool with adaptive pacing (3s–30s); uses actual token counts from Ollama API; integrates with rate_limiter.py
OpenAI Compatibility — Drop-in replacement at /v1/chat/completions; works with OpenAI Python SDK and OpenAI JS SDK; base URL localhost:11434/v1/ for CPU pillar, ollama.com/v1/ for cloud pillar

SDKs

Python SDK — ollama library (PyPI); sync, async, cloud client, auto-parsed tool schemas; mindX uses aiohttp directly via OllamaAPI for maximum control
JavaScript SDK — ollama library (npm); browser, Node.js, cloud, abort; used by mindX frontend

Setup & Operations

Getting Started — Installation on Linux, macOS, Windows; first model pull; mindX quick setup for the CPU pillar
GPU Support — NVIDIA (CC 5.0+), AMD ROCm, Apple Metal, Vulkan (experimental); the 10.0.0.155 GPU server when online
Docker — CPU, NVIDIA, AMD, Vulkan containers; Docker Hub; Compose with mindX
Modelfile — Custom model creation; canonical schema for model collection, rating, and agent-model alignment toward Chimaiera
FAQ & Troubleshooting — Context window, keep_alive, Flash Attention, KV cache quantization, concurrency, VPS production notes

mindX Integration

OllamaCloudTool — First-class BaseTool for the cloud pillar. Any agent can execute(operation="chat", model="deepseek-v3.2", message="..."). Dual access (local proxy + direct API), embedded CloudRateLimiter, 18dp precision metrics, conversation history, branch-ready. Registered in augmentic_tools_registry.json with access_control: [""]. Wired into _resolve_inference_model() as Step 5 (guarantee).

Architecture — Integration layer diagram; OllamaAPI → OllamaChatManager → LLMFactory → InferenceDiscovery; 5-step resilience chain; cloud offload

Configuration — MINDX_LLM__OLLAMA__BASE_URL (CPU pillar), OLLAMA_API_KEY (cloud pillar), models/ollama.yaml, BANKON vault, llm_factory_config.json

Precision Metrics — llm/precision_metrics.py: 18-decimal-place scientific tracking; Decimal accumulation; actual counts only; separate cloud file at data/metrics/cloud_precision_metrics.json

Capability Examples — Working Python code for all 10 capabilities: streaming, thinking, structured outputs, vision, embeddings, tool calling, web search, cloud, model management, rate-limited cloud client

Test & Benchmarking

scripts/test_cloud_all_models.py — Primary benchmark: single prompt to every model, precision metrics (18dp Decimal), actual eval_count/eval_duration from Ollama API; see Latest Benchmark and How Cloud Works Without an API Key

scripts/test_cloud_inference.py — Original multi-source benchmark (local + cloud + vLLM)

scripts/test_ollama_connection.py — Connection test using OllamaAPI

Existing mindX Ollama Docs (pre-2026-04-11)

docs/ollama_api_integration.md — Original API compliance notes (timeouts, keep_alive, token counting)

docs/ollama_integration.md — Custom client (OllamaAPI) vs official library (Python SDK)

docs/ollama_model_capability_tool.md — Model discovery and capability registration

docs/OLLAMA_VLLM_CLOUD_RESEARCH.md — Cloud + vLLM research (2026-04-10); established the dual-pillar strategy

llm/RESILIENCE.md — Graded inference hierarchy: Primary → Secondary → Failsafe (CPU) → Guarantee (Cloud)

Latest Benchmark (2026-04-11)

Prompt: "You are mindX. In one sentence, describe what you are." Script: test_cloud_all_models.py | Results: data/cloud_test_results.json

Model Pillar eval prompt total tok/s wall_ms total_ms
gpt-oss:120b-cloud Cloud 67 81 148 65.52 1,214 1,022

deepseek-r1:1.5b CPU 79 17 96 8.00 16,294 16,291

deepseek-coder:latest CPU 72 83 155 7.29 22,569 22,565
Aggregate (all values ACTUAL from Ollama API, 18dp precision):
Total tokens: 399 (218 eval + 181 prompt) = 399000000000000000000 sub-tokens

Aggregate throughput: 11.03 tok/s (11.033658223708593293 at 18dp)

Cloud vs CPU speedup: 8.2x (65.52 vs ~7.6 tok/s) — 120B cloud GPU vs 1.5B local CPU

Cloud Timing Note

Cloud-proxied models (gpt-oss:120b-cloud) return eval_duration_ns: 0 — the local offload proxy does not expose per-stage timing from the remote GPU. The total_duration_ns is used for tok/s calculation instead. CPU pillar models return all duration fields. See test_cloud_all_models.py line 114 for the fallback logic.

36 Additional Cloud Models Available

The cloud catalog lists 36 models at ollama.com/api/tags. To test them:

# Option A: Pull with -cloud suffix for free-tier proxy (metadata only, no weights) ollama pull deepseek-v3.2-cloud python3 scripts/test_cloud_all_models.py --local Option B: Set API key for direct cloud access to all 36 models export OLLAMA_API_KEY=your_key python3 scripts/test_cloud_all_models.py

How Cloud Works Without an API Key

test_cloud_inference.py and OllamaCloudTool return cloud model responses without OLLAMA_API_KEY because of Ollama's local offload architecture:

The -cloud Suffix

Model names with -cloud appended (e.g., gpt-oss:120b-cloud) are metadata-only pulls that proxy inference to ollama.com. Without the suffix (e.g., gpt-oss:120b), ollama pull downloads the full model weights (gigabytes) for CPU pillar execution.

The cloud catalog returns names without -cloud. Append it for free-tier local proxy:

ollama pull gpt-oss:120b-cloud # metadata only → inference proxied to cloud ollama pull deepseek-v3.2-cloud # metadata only → inference proxied to cloud vs ollama pull gemma3:4b # downloads 3.3GB weights for local CPU execution

The Mechanism

ollama pull gpt-oss:120b-cloud downloads metadata (not weights) to the local daemon

ollama run gpt-oss:120b-cloud sends the request to localhost:11434 like any local model

The local Ollama daemon detects the -cloud tag and transparently proxies to ollama.com

Authentication is handled by the daemon using credentials from ollama signin (stored at ~/.ollama/id_ed25519; see FAQ)

OllamaCloudTool calls localhost:11434/api/chat — no Bearer token needed because the local daemon is the auth proxy

Agent → OllamaCloudTool.execute(operation="chat") → _try_local_proxy() → localhost:11434/api/chat (model-cloud) → local Ollama daemon ↓ (transparent proxy) ollama.com (auth via ed25519 key) ↓ Cloud GPU inference ↓ Agent ← result (eval_count, tokens_per_sec, 18dp) ← ollama.com

Three Access Paths

Path URL Auth Pull When Tool Method
Local offload localhost:11434 None (daemon) ollama pull model-cloud Free tier, no key _try_local_proxy()

Direct API ollama.com/api/chat Bearer $OLLAMA_API_KEY None needed Key set _try_direct_cloud()

Local execution localhost:11434 None ollama pull model (full weights) Always offline OllamaAPI
OllamaCloudTool in auto mode tries local proxy first, then direct cloud — matching the dual-pillar design.

Why /api/tags Works Without Auth

The model listing endpoint at https://ollama.com/api/tags is publicly accessible — it lists available cloud models for discovery. This is how test_cloud_all_models.py, OllamaCloudTool.list_models, and OllamaCloudModelDiscovery discover available models without authentication.

Free Tier Limits

Limit Value Reset Tracked By
Session Light usage Every 5 hours CloudRateLimiter in OllamaCloudTool

Weekly Light usage Every 7 days CloudQuotaTracker

Concurrent cloud models 1 — Ollama server-side
See Cloud Rate Limiting for the adaptive pacing strategy (3s–30s based on quota utilization) that maximizes throughput within these limits using actual token counts.

Modelfile as Canonical Schema

The Ollama Modelfile is mindX's canonical schema for model collection and rating across both pillars:

Instruction Maps To mindX Component
FROM Base architecture/weights models/ollama.yaml models[].name

PARAMETER Operational characteristics models/ollama.yaml model_selection

TEMPLATE Communication protocol Go template syntax

SYSTEM Cognitive identity Agent system prompts in BDIAgent

Capabilities Dynamic from /api/show OllamaCloudModelDiscovery
This feeds into:
HierarchicalModelScorer — learned task_scores from precision metrics feedback

OllamaCloudModelDiscovery — dynamic capability detection across both CPU and cloud models

InferenceDiscovery — provider routing with cloud guarantee fallback

Agent-model alignment toward Chimaiera (the ROI moment when model composition outperforms single-model inference)

See Modelfile Reference for the full instruction set and Chimaiera alignment section.

Precision Metrics

Token tracking at 18 decimal places using Python Decimal. No floating-point drift. No estimation. Applied identically to both CPU and Cloud pillars.

What Before After Module
Token counts word_count 1.3 eval_count from Ollama API precision_metrics.py

Timing float milliseconds int nanoseconds (Ollama native) OllamaResponseMetrics

Accumulation float (compounding drift) Decimal (28-digit significand) PrecisionAccumulator

Sub-token unit none 1 token = 10^18 sub-tokens (wei equivalent) SUBTOKEN_FACTOR

Cloud tok/s not tracked eval_count / total_duration_ns (cloud proxy returns eval_duration: 0) OllamaCloudTool
Local metrics: data/metrics/precision_metrics.json (via OllamaAPI) Cloud metrics: data/metrics/cloud_precision_metrics.json (via OllamaCloudTool)

Full docs: Precision Metrics.

Resilience Design

The 5-step resolution chain in _resolve_inference_model() ensures mindX always has inference when any network path is available:

Step 1: InferenceDiscovery → best provider (Gemini, Mistral, Groq, etc.) ↓ all keys exhausted or rate limited Step 2: OllamaChatManager → local model selection (HierarchicalModelScorer) ↓ connection stale or failed Step 3: Re-init OllamaChatManager → retry with fresh connection ↓ still failing Step 4: Direct HTTP → localhost:11434/api/tags (zero dependencies) ↓ local Ollama completely down Step 5: OllamaCloudTool → ollama.com GPU inference ← GUARANTEE (24/7/365) ↓ cloud also unreachable (network down) → None → fallback_decide() rule-based heuristics → 2-min backoff

Tier Role Provider Speed mindX Component
Primary Best quality Gemini, Mistral Varies LLMFactory

Secondary Speed/cost Groq, Together Fast LLMFactory

Failsafe CPU pillar Ollama local (localhost:11434) ~8 tok/s OllamaChatManager

Guarantee Cloud pillar Ollama Cloud (ollama.com) ~65 tok/s OllamaCloudTool

Last resort No inference — — fallback_decide() rule-based
Cloud is guarantee, not default. The _cloud_inference_active flag in mindXagent.py routes one chat through OllamaCloudTool, then resets so the next cycle tries local first. This preserves CPU pillar autonomy while ensuring the cloud pillar catches every gap.

InferenceDiscovery.get_provider_for_task() routes tasks through the same hierarchy: preferred provider → ollama_local → ollama_cloud → any available → None.

Implementation: _resolve_inference_model() (5 steps) → InferenceDiscovery (provider probing + cloud fallback) → OllamaCloudTool (cloud guarantee) → RESILIENCE.md (graded hierarchy docs) → chat_with_ollama() (cloud routing when active).

mindX File Map

Core Ollama Integration

File Role Doc Pillar
tools/cloud/ollama_cloud_tool.py OllamaCloudTool — cloud inference for any agent This page Cloud

api/ollama/ollama_url.py HTTP API client, rate limiter, precision metrics, failover Architecture CPU

agents/core/ollama_chat_manager.py Connection manager, model discovery, conversation history Architecture CPU

agents/core/mindXagent.py 5-step resolution chain, cloud routing, autonomous loop Architecture Both

llm/ollama_handler.py LLMFactory handler interface Architecture CPU

llm/llm_factory.py Master factory, provider selection Configuration Both

llm/rate_limiter.py Token-bucket rate limiting Cloud Rate Limiting Both

llm/precision_metrics.py 18dp scientific token tracking Precision Metrics Both

llm/inference_discovery.py Boot-time probe, task routing, cloud guarantee Architecture Both

models/ollama.yaml Model registry, task scores, cloud config Configuration Both

api/ollama/ollama_admin_routes.py Admin endpoints (status, test, models) FAQ CPU

agents/core/model_scorer.py HierarchicalModelScorer Modelfile Schema Both

agents/core/inference_optimizer.py Sliding-scale frequency optimization Architecture CPU

agents/hostinger_vps_agent.py VPS management: 3 MCP channels (SSH + Hostinger API + Backend) NAV.md Both
Test Scripts

File Purpose Pillar
scripts/test_cloud_all_models.py Primary: every model, precision metrics, 18dp Decimal Both

scripts/test_cloud_inference.py Original: local + cloud + vLLM comparison Both

scripts/test_ollama_connection.py Connection test via OllamaAPI CPU

data/cloud_test_results.json Latest benchmark results (JSON, 18dp) Both

External References

Resource URL Relevance
Ollama Homepage ollama.com Both pillars

Ollama Docs docs.ollama.com API reference source

Ollama API (OpenAPI) docs.ollama.com/openapi.yaml API docs source

Ollama GitHub github.com/ollama/ollama Setup

Python SDK github.com/ollama/ollama-python SDK docs

JavaScript SDK github.com/ollama/ollama-js SDK docs

Cloud Models ollama.com/search?c=cloud Cloud pillar catalog

Thinking Models ollama.com/search?c=thinking Thinking feature

Vision Models ollama.com/search?c=vision Vision feature

Tool Models ollama.com/search?c=tools Tool Calling feature

Model Library ollama.com/library Modelfile reference

API Keys ollama.com/settings/keys Cloud auth

Discord discord.gg/ollama Community

Docker Hub hub.docker.com/r/ollama/ollama Docker setup

OllamaFreeAPI github.com/mfoud444/ollamafreeapi Community gateway

mindX Production mindx.pythai.net Live CPU pillar

mindX Thesis docs/THESIS.md Darwin-Godel Machine synthesis

mindX Manifesto docs/MANIFESTO.md Chimaiera roadmap

RAGE docs/AGINT.md Embeddings architecture — RAGE wipes the floor with RAG

Attribution docs/ATTRIBUTION.md Open source stack: Ollama, vLLM, SwarmClaw, pgvector

Version Info

Ollama docs: Fetched 2026-04-11 from docs.ollama.com

Operational standards: CPU (OllamaAPI + OllamaChatManager) + Cloud (OllamaCloudTool)

Resilience: 5-step chain in _resolve_inference_model() with cloud guarantee

Precision: 18dp Decimal via precision_metrics.py, actual counts from Ollama API

Production: mindx.pythai.net (4GB VPS, CPU pillar, dual-URL failover)

Benchmark: 2026-04-11 — 3 models, 399 tokens, cloud 8.2x faster than CPU

28 files, ~6,000 lines — self-contained for resilient offline operation
Referenced in this document
AGINT ATTRIBUTION MANIFESTO NAV OLLAMA_VLLM_CLOUD_RESEARCH THESIS ollama_api_integration ollama_integration ollama_model_capability_tool
All Documents Document Index The Book of mindX Improvement Journal API Reference

Ollama Complete Reference — Local Documentation for mindX

Operational Standards

Quick Navigation

API Reference

Features

Cloud & Infrastructure

SDKs

Setup & Operations

mindX Integration

Test & Benchmarking

Existing mindX Ollama Docs (pre-2026-04-11)

Latest Benchmark (2026-04-11)

Cloud Timing Note

36 Additional Cloud Models Available

Option B: Set API key for direct cloud access to all 36 models

How Cloud Works Without an API Key

The `-cloud` Suffix

vs

The Mechanism

Three Access Paths

Why `/api/tags` Works Without Auth

Free Tier Limits

Modelfile as Canonical Schema

Precision Metrics

Resilience Design

mindX File Map

Core Ollama Integration

Test Scripts

External References

Version Info

Model	Pillar	eval	prompt	total	tok/s	wall_ms	total_ms
`gpt-oss:120b-cloud`	Cloud	67	81	148	65.52	1,214	1,022
`deepseek-r1:1.5b`	CPU	79	17	96	8.00	16,294	16,291
`deepseek-coder:latest`	CPU	72	83	155	7.29	22,569	22,565

Path	URL	Auth	Pull	When	Tool Method
Local offload	`localhost:11434`	None (daemon)	`ollama pull model-cloud`	Free tier, no key	`_try_local_proxy()`
Direct API	`ollama.com/api/chat`	`Bearer $OLLAMA_API_KEY`	None needed	Key set	`_try_direct_cloud()`
Local execution	`localhost:11434`	None	`ollama pull model` (full weights)	Always offline	`OllamaAPI`

Limit	Value	Reset	Tracked By
Session	Light usage	Every 5 hours	`CloudRateLimiter` in `OllamaCloudTool`
Weekly	Light usage	Every 7 days	`CloudQuotaTracker`
Concurrent cloud models	1	—	Ollama server-side

Instruction	Maps To	mindX Component
`FROM`	Base architecture/weights	`models/ollama.yaml` `models[].name`
`PARAMETER`	Operational characteristics	`models/ollama.yaml` `model_selection`
`TEMPLATE`	Communication protocol	Go template syntax
`SYSTEM`	Cognitive identity	Agent system prompts in `BDIAgent`
Capabilities	Dynamic from `/api/show`	`OllamaCloudModelDiscovery`

What	Before	After	Module
Token counts	`word_count 1.3`	`eval_count` from Ollama API	`precision_metrics.py`
Timing	`float` milliseconds	`int` nanoseconds (Ollama native)	`OllamaResponseMetrics`
Accumulation	`float` (compounding drift)	`Decimal` (28-digit significand)	`PrecisionAccumulator`
Sub-token unit	none	1 token = 10^18 sub-tokens (wei equivalent)	`SUBTOKEN_FACTOR`
Cloud tok/s	not tracked	`eval_count / total_duration_ns` (cloud proxy returns `eval_duration: 0`)	`OllamaCloudTool`

Tier	Role	Provider	Speed	mindX Component
Primary	Best quality	Gemini, Mistral	Varies	`LLMFactory`
Secondary	Speed/cost	Groq, Together	Fast	`LLMFactory`
Failsafe	CPU pillar	Ollama local (`localhost:11434`)	~8 tok/s	`OllamaChatManager`
Guarantee	Cloud pillar	Ollama Cloud (`ollama.com`)	~65 tok/s	`OllamaCloudTool`
Last resort	No inference	—	—	`fallback_decide()` rule-based

File	Role	Doc	Pillar
`tools/cloud/ollama_cloud_tool.py`	OllamaCloudTool — cloud inference for any agent	This page	Cloud
`api/ollama/ollama_url.py`	HTTP API client, rate limiter, precision metrics, failover	Architecture	CPU
`agents/core/ollama_chat_manager.py`	Connection manager, model discovery, conversation history	Architecture	CPU
`agents/core/mindXagent.py`	5-step resolution chain, cloud routing, autonomous loop	Architecture	Both
`llm/ollama_handler.py`	`LLMFactory` handler interface	Architecture	CPU
`llm/llm_factory.py`	Master factory, provider selection	Configuration	Both
`llm/rate_limiter.py`	Token-bucket rate limiting	Cloud Rate Limiting	Both
`llm/precision_metrics.py`	18dp scientific token tracking	Precision Metrics	Both
`llm/inference_discovery.py`	Boot-time probe, task routing, cloud guarantee	Architecture	Both
`models/ollama.yaml`	Model registry, task scores, cloud config	Configuration	Both
`api/ollama/ollama_admin_routes.py`	Admin endpoints (status, test, models)	FAQ	CPU
`agents/core/model_scorer.py`	`HierarchicalModelScorer`	Modelfile Schema	Both
`agents/core/inference_optimizer.py`	Sliding-scale frequency optimization	Architecture	CPU
`agents/hostinger_vps_agent.py`	VPS management: 3 MCP channels (SSH + Hostinger API + Backend)	NAV.md	Both

File	Purpose	Pillar
`scripts/test_cloud_all_models.py`	Primary: every model, precision metrics, 18dp `Decimal`	Both
`scripts/test_cloud_inference.py`	Original: local + cloud + vLLM comparison	Both
`scripts/test_ollama_connection.py`	Connection test via `OllamaAPI`	CPU
`data/cloud_test_results.json`	Latest benchmark results (JSON, 18dp)	Both

Resource	URL	Relevance
Ollama Homepage	ollama.com	Both pillars
Ollama Docs	docs.ollama.com	API reference source
Ollama API (OpenAPI)	docs.ollama.com/openapi.yaml	API docs source
Ollama GitHub	github.com/ollama/ollama	Setup
Python SDK	github.com/ollama/ollama-python	SDK docs
JavaScript SDK	github.com/ollama/ollama-js	SDK docs
Cloud Models	ollama.com/search?c=cloud	Cloud pillar catalog
Thinking Models	ollama.com/search?c=thinking	Thinking feature
Vision Models	ollama.com/search?c=vision	Vision feature
Tool Models	ollama.com/search?c=tools	Tool Calling feature
Model Library	ollama.com/library	Modelfile reference
API Keys	ollama.com/settings/keys	Cloud auth
Discord	discord.gg/ollama	Community
Docker Hub	hub.docker.com/r/ollama/ollama	Docker setup
OllamaFreeAPI	github.com/mfoud444/ollamafreeapi	Community gateway
mindX Production	mindx.pythai.net	Live CPU pillar
mindX Thesis	`docs/THESIS.md`	Darwin-Godel Machine synthesis
mindX Manifesto	`docs/MANIFESTO.md`	Chimaiera roadmap
RAGE	`docs/AGINT.md`	Embeddings architecture — RAGE wipes the floor with RAG
Attribution	`docs/ATTRIBUTION.md`	Open source stack: Ollama, vLLM, SwarmClaw, pgvector

Ollama Complete Reference — Local Documentation for mindX

Operational Standards

Quick Navigation

API Reference

Features

Cloud & Infrastructure

SDKs

Setup & Operations

mindX Integration

Test & Benchmarking

Existing mindX Ollama Docs (pre-2026-04-11)

Latest Benchmark (2026-04-11)

Cloud Timing Note

36 Additional Cloud Models Available

Option B: Set API key for direct cloud access to all 36 models

How Cloud Works Without an API Key

The -cloud Suffix

vs

The Mechanism

Three Access Paths

Why /api/tags Works Without Auth

Free Tier Limits

Modelfile as Canonical Schema

Precision Metrics

Resilience Design

mindX File Map

Core Ollama Integration

Test Scripts

External References

Version Info

The `-cloud` Suffix

Why `/api/tags` Works Without Auth