ollama/INDEX.md · 30.6 KB

Ollama Complete Reference — Local Documentation for mindX

Self-contained reference for all Ollama capabilities.
No external docs needed — resilient offline operation.
Source: docs.ollama.com (fetched 2026-04-11) + mindX integration specifics.
>
Back to mindX Documentation Hub

Operational Standards

mindX operates from two inference pillars — both are operational standards, not fallbacks:

PillarSourceSpeedModel ScaleAvailabilityCost
CPU inferencelocalhost:11434~8 tok/s0.6B–1.7BAlways (no network)Zero
Cloud inferenceollama.com via OllamaCloudTool~65 tok/s3B–1T24/7/365 (free tier)Zero

CPU provides autonomy — mindX reasons even offline, even when every API key is exhausted. Cloud provides scale — 120B+ parameter models on NVIDIA GPUs, 8.2x faster than local CPU. Together they form the resilience guarantee: mindX never stops inferring.

The 5-step resolution chain in _resolve_inference_model() tries the best available source first and walks down to guarantee. CPU is the failsafe. Cloud is the guarantee. Both are always ready.


Quick Navigation

API Reference

EndpointLocalCloudDoc
POST /api/generatelocalhost:11434ollama.comgenerate.md
POST /api/chatlocalhost:11434ollama.comchat.md
POST /api/embedlocalhost:11434ollama.comembeddings.md
Model managementlocalhost:11434models.md
GET /api/ps, /api/versionlocalhost:11434running.md

All endpoints documented with every parameter, response field, and curl/Python/JavaScript examples. See the Ollama OpenAPI spec for the authoritative schema.

Features

Each feature doc includes curl, Python SDK, JavaScript SDK, and mindX-specific code examples. All features work identically on both CPU and Cloud pillars.

Cloud & Infrastructure

SDKs

Setup & Operations

mindX Integration

Test & Benchmarking

Existing mindX Ollama Docs (pre-2026-04-11)


Latest Benchmark (2026-04-11)

Prompt: "You are mindX. In one sentence, describe what you are." Script: test_cloud_all_models.py | Results: data/cloud_test_results.json

ModelPillarevalprompttotaltok/swall_mstotal_ms
gpt-oss:120b-cloudCloud678114865.521,2141,022
deepseek-r1:1.5bCPU7917968.0016,29416,291
deepseek-coder:latestCPU72831557.2922,56922,565

Aggregate (all values ACTUAL from Ollama API, 18dp precision):

Cloud Timing Note

Cloud-proxied models (gpt-oss:120b-cloud) return eval_duration_ns: 0 — the local offload proxy does not expose per-stage timing from the remote GPU. The total_duration_ns is used for tok/s calculation instead. CPU pillar models return all duration fields. See test_cloud_all_models.py line 114 for the fallback logic.

36 Additional Cloud Models Available

The cloud catalog lists 36 models at ollama.com/api/tags. To test them:

# Option A: Pull with -cloud suffix for free-tier proxy (metadata only, no weights)
ollama pull deepseek-v3.2-cloud
python3 scripts/test_cloud_all_models.py --local

Option B: Set API key for direct cloud access to all 36 models

export OLLAMA_API_KEY=your_key python3 scripts/test_cloud_all_models.py

How Cloud Works Without an API Key

test_cloud_inference.py and OllamaCloudTool return cloud model responses without OLLAMA_API_KEY because of Ollama's local offload architecture:

The -cloud Suffix

Model names with -cloud appended (e.g., gpt-oss:120b-cloud) are metadata-only pulls that proxy inference to ollama.com. Without the suffix (e.g., gpt-oss:120b), ollama pull downloads the full model weights (gigabytes) for CPU pillar execution.

The cloud catalog returns names without -cloud. Append it for free-tier local proxy:

ollama pull gpt-oss:120b-cloud      # metadata only → inference proxied to cloud
ollama pull deepseek-v3.2-cloud     # metadata only → inference proxied to cloud

vs

ollama pull gemma3:4b # downloads 3.3GB weights for local CPU execution

The Mechanism

  1. ollama pull gpt-oss:120b-cloud downloads metadata (not weights) to the local daemon
  2. ollama run gpt-oss:120b-cloud sends the request to localhost:11434 like any local model
  3. The local Ollama daemon detects the -cloud tag and transparently proxies to ollama.com
  4. Authentication is handled by the daemon using credentials from ollama signin (stored at ~/.ollama/id_ed25519; see FAQ)
  5. OllamaCloudTool calls localhost:11434/api/chatno Bearer token needed because the local daemon is the auth proxy
Agent → OllamaCloudTool.execute(operation="chat") → _try_local_proxy()
    → localhost:11434/api/chat (model-cloud) → local Ollama daemon
                                                  ↓ (transparent proxy)
                                             ollama.com (auth via ed25519 key)
                                                  ↓
                                             Cloud GPU inference
                                                  ↓
Agent ← result (eval_count, tokens_per_sec, 18dp) ← ollama.com

Three Access Paths

PathURLAuthPullWhenTool Method
Local offloadlocalhost:11434None (daemon)ollama pull model-cloudFree tier, no key_try_local_proxy()
Direct APIollama.com/api/chatBearer $OLLAMA_API_KEYNone neededKey set_try_direct_cloud()
Local executionlocalhost:11434Noneollama pull model (full weights)Always offlineOllamaAPI

OllamaCloudTool in auto mode tries local proxy first, then direct cloud — matching the dual-pillar design.

Why /api/tags Works Without Auth

The model listing endpoint at https://ollama.com/api/tags is publicly accessible — it lists available cloud models for discovery. This is how test_cloud_all_models.py, OllamaCloudTool.list_models, and OllamaCloudModelDiscovery discover available models without authentication.

Free Tier Limits

LimitValueResetTracked By
SessionLight usageEvery 5 hoursCloudRateLimiter in OllamaCloudTool
WeeklyLight usageEvery 7 daysCloudQuotaTracker
Concurrent cloud models1Ollama server-side

See Cloud Rate Limiting for the adaptive pacing strategy (3s–30s based on quota utilization) that maximizes throughput within these limits using actual token counts.


Modelfile as Canonical Schema

The Ollama Modelfile is mindX's canonical schema for model collection and rating across both pillars:

InstructionMaps TomindX Component
FROMBase architecture/weightsmodels/ollama.yaml models[].name
PARAMETEROperational characteristicsmodels/ollama.yaml model_selection
TEMPLATECommunication protocolGo template syntax
SYSTEMCognitive identityAgent system prompts in BDIAgent
CapabilitiesDynamic from /api/showOllamaCloudModelDiscovery

This feeds into:

  1. HierarchicalModelScorer — learned task_scores from precision metrics feedback
  2. OllamaCloudModelDiscovery — dynamic capability detection across both CPU and cloud models
  3. InferenceDiscovery — provider routing with cloud guarantee fallback
  4. Agent-model alignment toward Chimaiera (the ROI moment when model composition outperforms single-model inference)

See Modelfile Reference for the full instruction set and Chimaiera alignment section.


Precision Metrics

Token tracking at 18 decimal places using Python Decimal. No floating-point drift. No estimation. Applied identically to both CPU and Cloud pillars.

WhatBeforeAfterModule
Token countsword_count 1.3eval_count from Ollama APIprecision_metrics.py
Timingfloat millisecondsint nanoseconds (Ollama native)OllamaResponseMetrics
Accumulationfloat (compounding drift)Decimal (28-digit significand)PrecisionAccumulator
Sub-token unitnone1 token = 10^18 sub-tokens (wei equivalent)SUBTOKEN_FACTOR
Cloud tok/snot trackedeval_count / total_duration_ns (cloud proxy returns eval_duration: 0)OllamaCloudTool

Local metrics: data/metrics/precision_metrics.json (via OllamaAPI) Cloud metrics: data/metrics/cloud_precision_metrics.json (via OllamaCloudTool)

Full docs: Precision Metrics.


Resilience Design

The 5-step resolution chain in _resolve_inference_model() ensures mindX always has inference when any network path is available:

Step 1: InferenceDiscovery → best provider (Gemini, Mistral, Groq, etc.)
          ↓ all keys exhausted or rate limited
Step 2: OllamaChatManager → local model selection (HierarchicalModelScorer)
          ↓ connection stale or failed
Step 3: Re-init OllamaChatManager → retry with fresh connection
          ↓ still failing
Step 4: Direct HTTP → localhost:11434/api/tags (zero dependencies)
          ↓ local Ollama completely down
Step 5: OllamaCloudTool → ollama.com GPU inference ← GUARANTEE (24/7/365)
          ↓ cloud also unreachable (network down)
     → None → fallback_decide() rule-based heuristics → 2-min backoff
TierRoleProviderSpeedmindX Component
PrimaryBest qualityGemini, MistralVariesLLMFactory
SecondarySpeed/costGroq, TogetherFastLLMFactory
FailsafeCPU pillarOllama local (localhost:11434)~8 tok/sOllamaChatManager
GuaranteeCloud pillarOllama Cloud (ollama.com)~65 tok/sOllamaCloudTool
Last resortNo inferencefallback_decide() rule-based

Cloud is guarantee, not default. The _cloud_inference_active flag in mindXagent.py routes one chat through OllamaCloudTool, then resets so the next cycle tries local first. This preserves CPU pillar autonomy while ensuring the cloud pillar catches every gap.

InferenceDiscovery.get_provider_for_task() routes tasks through the same hierarchy: preferred provider → ollama_localollama_cloud → any available → None.

Implementation: _resolve_inference_model() (5 steps) → InferenceDiscovery (provider probing + cloud fallback) → OllamaCloudTool (cloud guarantee) → RESILIENCE.md (graded hierarchy docs) → chat_with_ollama() (cloud routing when active).


mindX File Map

Core Ollama Integration

FileRoleDocPillar
tools/cloud/ollama_cloud_tool.pyOllamaCloudTool — cloud inference for any agentThis pageCloud
api/ollama/ollama_url.pyHTTP API client, rate limiter, precision metrics, failoverArchitectureCPU
agents/core/ollama_chat_manager.pyConnection manager, model discovery, conversation historyArchitectureCPU
agents/core/mindXagent.py5-step resolution chain, cloud routing, autonomous loopArchitectureBoth
llm/ollama_handler.pyLLMFactory handler interfaceArchitectureCPU
llm/llm_factory.pyMaster factory, provider selectionConfigurationBoth
llm/rate_limiter.pyToken-bucket rate limitingCloud Rate LimitingBoth
llm/precision_metrics.py18dp scientific token trackingPrecision MetricsBoth
llm/inference_discovery.pyBoot-time probe, task routing, cloud guaranteeArchitectureBoth
models/ollama.yamlModel registry, task scores, cloud configConfigurationBoth
api/ollama/ollama_admin_routes.pyAdmin endpoints (status, test, models)FAQCPU
agents/core/model_scorer.pyHierarchicalModelScorerModelfile SchemaBoth
agents/core/inference_optimizer.pySliding-scale frequency optimizationArchitectureCPU
agents/hostinger_vps_agent.pyVPS management: 3 MCP channels (SSH + Hostinger API + Backend)NAV.mdBoth

Test Scripts

FilePurposePillar
scripts/test_cloud_all_models.pyPrimary: every model, precision metrics, 18dp DecimalBoth
scripts/test_cloud_inference.pyOriginal: local + cloud + vLLM comparisonBoth
scripts/test_ollama_connection.pyConnection test via OllamaAPICPU
data/cloud_test_results.jsonLatest benchmark results (JSON, 18dp)Both

External References

ResourceURLRelevance
Ollama Homepageollama.comBoth pillars
Ollama Docsdocs.ollama.comAPI reference source
Ollama API (OpenAPI)docs.ollama.com/openapi.yamlAPI docs source
Ollama GitHubgithub.com/ollama/ollamaSetup
Python SDKgithub.com/ollama/ollama-pythonSDK docs
JavaScript SDKgithub.com/ollama/ollama-jsSDK docs
Cloud Modelsollama.com/search?c=cloudCloud pillar catalog
Thinking Modelsollama.com/search?c=thinkingThinking feature
Vision Modelsollama.com/search?c=visionVision feature
Tool Modelsollama.com/search?c=toolsTool Calling feature
Model Libraryollama.com/libraryModelfile reference
API Keysollama.com/settings/keysCloud auth
Discorddiscord.gg/ollamaCommunity
Docker Hubhub.docker.com/r/ollama/ollamaDocker setup
OllamaFreeAPIgithub.com/mfoud444/ollamafreeapiCommunity gateway
mindX Productionmindx.pythai.netLive CPU pillar
mindX Thesisdocs/THESIS.mdDarwin-Godel Machine synthesis
mindX Manifestodocs/MANIFESTO.mdChimaiera roadmap
RAGEdocs/AGINT.mdEmbeddings architecture — RAGE wipes the floor with RAG
Attributiondocs/ATTRIBUTION.mdOpen source stack: Ollama, vLLM, SwarmClaw, pgvector

Version Info


Referenced in this document
AGINTATTRIBUTIONMANIFESTONAVOLLAMA_VLLM_CLOUD_RESEARCHTHESISollama_api_integrationollama_integrationollama_model_capability_tool

All DocumentsDocument IndexThe Book of mindXImprovement JournalAPI Reference