mindX Ollama Configuration Guide

Complete configuration reference for Ollama in mindX.

Environment Variables

Primary (in `.env` or systemd)

# Ollama server URL (overrides all other config)
MINDX_LLM__OLLAMA__BASE_URL=http://10.0.0.155:18080
Cloud API key (store in BANKON vault for production)
OLLAMA_API_KEY=your_key_here
Logging
MINDX_LOGGING_LEVEL=INFO

Ollama Server Configuration

# Set in systemd: /etc/systemd/system/ollama.service.d/override.conf
OLLAMA_HOST=0.0.0.0:11434       # Listen on all interfaces
OLLAMA_KEEP_ALIVE=5m             # Default model retention
OLLAMA_CONTEXT_LENGTH=4096       # Default context
OLLAMA_MAX_LOADED_MODELS=1       # 4GB VPS = 1 model at a time
OLLAMA_NUM_PARALLEL=1            # Single request at a time
OLLAMA_MAX_QUEUE=64              # Queue limit
OLLAMA_FLASH_ATTENTION=1         # Reduce memory
OLLAMA_KV_CACHE_TYPE=q8_0        # Halve context memory

Model Registry: models/ollama.yaml

provider: ollama display_name: Ollama (Local GPU) enabled: true base_url: http://10.0.0.155:18080 fallback_url: http://localhost:11434 timeout: 120.0 connect_timeout: 10.0 sock_read_timeout: 60.0 rate_limits: requests_per_minute: 1000 # Local = high limits tokens_per_minute: 10000000 default_model: qwen3:1.7b keep_alive: 10m models: - name: qwen3:1.7b task_scores: reasoning: 0.75 code_generation: 0.78 simple_chat: 0.88

features: streaming: true embeddings: true function_calling: false # Update when tool models are available locally vision: false # Update when vision models are available locally tool_use: false

LLM Factory Config: data/config/llm_factory_config.json

{
  "default_provider_preference_order": ["gemini", "openai", "anthropic", "ollama"],
  "ollama_settings_for_factory": {
    "base_url_override": null,
    "api_key_override": null
  },
  "rate_limit_profiles": {
    "ollama_local": {"rpm": 1000, "rph": 60000},
    "ollama_cloud": {"rpm": 10, "rph": 150}
  }
}

BANKON Vault (Production)

# Store cloud API key in vault (not .env)
python manage_credentials.py store ollama_cloud_api_key "KEY"
python manage_credentials.py list

Connection Testing

# Script test
python scripts/test_ollama_connection.py
Admin API
curl http://localhost:8000/api/admin/ollama/status
curl http://localhost:8000/api/admin/ollama/test
curl http://localhost:8000/api/admin/ollama/models
Direct Ollama
curl http://localhost:11434/api/tags
curl http://localhost:11434/api/ps
curl http://localhost:11434/api/version

Models Currently Installed (VPS as of 2026-04-11)

ModelSizePurpose qwen3:1.7b1.4GBAutonomous default qwen3.5:2b2.7GBNewer, may be tight on RAM qwen3:0.6b0.5GBLightweight tasks mxbai-embed-large0.7GBEmbeddings for RAGE nomic-embed-text0.3GBEmbeddings (alternative) deepseek-r1:1.5b~1.0GBReasoning (GPU server)

Adding Cloud as Inference Source

To add Ollama cloud alongside local inference:

Store API key: python manage_credentials.py store ollama_cloud_api_key "KEY"

Add to InferenceDiscovery: Register https://ollama.com as a provider

Configure rate limiting: 10 RPM for cloud (see cloud/rate_limiting.md)

Set routing rules: Heavy tasks → cloud, light tasks → local

All Documents Document Index The Book of mindX Improvement Journal API Reference

mindX Ollama Configuration Guide

Environment Variables

Primary (in .env or systemd)

Cloud API key (store in BANKON vault for production)

Logging