How mindX uses Ollama — from production deployment at mindx.pythai.net to local development.
┌─────────────────────────────────────────────────────────────┐
│ mindX Agent Layer │
│ MindXAgent · BlueprintAgent · AuthorAgent · CEOAgent │
│ │
│ ┌────────────────────┐ ┌──────────────────────────────┐ │
│ │ OllamaChatManager │ │ InferenceDiscovery │ │
│ │ (agents/core/) │ │ (llm/inference_discovery.py) │ │
│ │ │ │ │ │
│ │ • Model discovery │ │ • Probes all sources at boot │ │
│ │ • Best model select│ │ • Validates before each cycle│ │
│ │ • Chat history │ │ • Feeds HierarchicalScorer │ │
│ │ • Auto-retry │ │ │ │
│ └────────┬───────────┘ └──────────────┬───────────────┘ │
│ │ │ │
│ ┌────────┴──────────────────────────────┴───────────────┐ │
│ │ OllamaAPI (api/ollama/ollama_url.py) │ │
│ │ │ │
│ │ • /api/generate and /api/chat endpoints │ │
│ │ • Token-bucket rate limiter (1000 RPM local) │ │
│ │ • Dual-URL failover (primary → fallback) │ │
│ │ • 120s timeout, keep_alive, format, think support │ │
│ │ • Actual token counting from API response │ │
│ └───────────┬────────────────────────┬──────────────────┘ │
│ │ │ │
│ ┌───────────┴───────┐ ┌────────────┴──────────────────┐ │
│ │ OllamaHandler │ │ LLMFactory │ │
│ │ (llm/ollama_ │ │ (llm/llm_factory.py) │ │
│ │ handler.py) │ │ │ │
│ │ │ │ • Provider preference order │ │
│ │ • LLMHandlerIface │ │ • DualLayerRateLimiter │ │
│ │ • /api/generate │ │ • Handler caching │ │
│ │ • Returns None on │ │ • Ollama = last resort fallback│ │
│ │ failure (→ next) │ │ • Default: phi3:mini │ │
│ └───────────────────┘ └───────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│ │
┌───────────┘ └──────────┐
▼ ▼
┌───────────────────┐ ┌──────────────────┐
│ Primary: GPU │ │ Cloud: ollama.com│
│ 10.0.0.155:18080 │ │ (OLLAMA_API_KEY) │
│ (when available) │ │ Free/Pro/Max tier│
└───────────────────┘ └──────────────────┘
│ (unreachable?)
▼
┌───────────────────┐
│ Fallback: CPU │
│ localhost:11434 │
│ (always available)│
└───────────────────┘
api/ollama/ollama_url.pyagents/core/ollama_chat_manager.pyllm/ollama_handler.pyllm/llm_factory.pyllm/rate_limiter.pyllm/inference_discovery.pymodels/ollama.yamlapi/ollama/ollama_admin_routes.pyapi/ollama/ollama_model_capability_tool.pyENV: MINDX_LLM__OLLAMA__BASE_URL
→ explicit base_url parameter
→ models/ollama.yaml base_url
→ data/config/*.json settings
→ localhost:11434 (default)
_resolve_inference_model() — 5-step chain:
Step 1: InferenceDiscovery → best provider (Gemini, Mistral, Groq, etc.)
Step 2: OllamaChatManager → local model selection
Step 3: Re-init OllamaChatManager → retry with fresh connection
Step 4: Direct HTTP → localhost:11434/api/tags (zero dependencies)
Step 5: OllamaCloudTool → ollama.com GPU inference ← GUARANTEE (24/7/365)
→ None → fallback_decide() → 2-min backoff
localhost:11434)ollama.com)mindX never has an inference gap when ollama.com is reachable. Cloud is the guarantee, not the default — the _cloud_inference_active flag in mindXagent.py resets after one use so the next cycle tries local first.
-cloud suffix)Cloud models accessed through the local daemon use the -cloud tag suffix. This is a metadata-only pull — inference is proxied to ollama.com GPU servers. See How Cloud Works Without an API Key and the latest benchmark.
ollama pull gpt-oss:120b-cloud → metadata only, inference on cloud GPU (65 tok/s)
ollama pull deepseek-r1:1.5b → full weights, inference on local CPU (8 tok/s)
Test script: scripts/test_cloud_all_models.py
qwen3:1.7b as autonomous default (~2GB RAM)qwen3:0.6b for lightweight tasks (~1GB)mxbai-embed-large (0.7GB), nomic-embed-text (0.3GB)keep_alive: 5m — free memory between cyclesstream: False hardcodedOllamaCloudTool — cloud inference as a first-class BaseTool_resolve_inference_model() as Step 5 (guarantee)CloudRateLimiterdata/metrics/cloud_precision_metrics.jsonagents/hostinger_vps_agent.py manages the production VPS through three MCP channels:
root@168.231.126.58~/.ssh/id_rsaHOSTINGER_API_KEYfull_health_check() queries all three in parallel. register_mcp_context() publishes tool definitions for agent discovery. See .agent definition.