ollama/setup/modelfile.md · 9.1 KB

Modelfile Reference

Blueprint for creating and sharing customized models. In mindX, the Modelfile is the canonical schema for model collection, capability rating, and agent-model alignment toward Chimaiera.

Format

# comment
INSTRUCTION arguments

Instructions are case-insensitive and can appear in any order.

Instructions

InstructionRequiredDescription FROMyesBase model PARAMETERnoRuntime parameters TEMPLATEnoPrompt template (Go template syntax) SYSTEMnoSystem message ADAPTERnoLoRA adapter path LICENSEnoLegal license MESSAGEnoConversation examples REQUIRESnoMinimum Ollama version

FROM (Required)

# From existing model
FROM llama3.2

From Safetensors directory

FROM /path/to/safetensors/

From GGUF file

FROM ./ollama-model.gguf

Supported architectures: Llama (2, 3, 3.1, 3.2), Mistral (1, 2, Mixtral), Gemma (1, 2), Phi3

PARAMETER

PARAMETER <name> <value>

Complete Parameter Reference

ParameterTypeDefaultDescription num_ctxint2048Context window size (tokens) num_predictint-1Max tokens to generate (-1 = infinite) temperaturefloat0.8Creativity (0 = deterministic, 2.0 = very random) top_kint40Limit next token to K most likely top_pfloat0.9Nucleus sampling threshold min_pfloat0.0Minimum probability threshold repeat_last_nint64Lookback for repetition detection (0 = disabled, -1 = num_ctx) repeat_penaltyfloat1.1Repetition penalty (higher = stronger) seedint0Random seed for reproducibility stopstring—Stop sequence (multiple allowed)

Example

FROM qwen3:1.7b
PARAMETER temperature 0.7
PARAMETER num_ctx 4096
PARAMETER top_p 0.9
PARAMETER stop "<|endoftext|>"
PARAMETER stop "<|im_end|>"

TEMPLATE

Go template syntax with these variables:

VariableDescription {{ .System }}System message {{ .Prompt }}User prompt {{ .Response }}Model response (text after this is omitted during generation) {{ .Suffix }}Text after assistant response {{ .Messages }}Message list (for chat templates) {{ .Messages[].Role }}system, user, assistant, tool {{ .Messages[].Content }}Message text {{ .Messages[].ToolCalls }}Tool call requests {{ .Tools }}Available tool definitions

ChatML Template

TEMPLATE """{{- range .Messages }}<|im_start|>{{ .Role }}
{{ .Content }}<|im_end|>
{{ end }}<|im_start|>assistant
"""

Llama 3 Template

TEMPLATE """{{- if .System }}<|start_header_id|>system<|end_header_id|>
{{ .System }}<|eot_id|>
{{- end }}
{{- range .Messages }}<|start_header_id|>{{ .Role }}<|end_header_id|>
{{ .Content }}<|eot_id|>
{{- end }}<|start_header_id|>assistant<|end_header_id|>
"""

Mistral with Tool Calling Template

TEMPLATE """{{- range $index, $_ := .Messages }}
{{- if eq .Role "user" }}
{{- if and (le (len (slice $.Messages $index)) 2) $.Tools }}[AVAILABLE_TOOLS] {{ json $.Tools }}[/AVAILABLE_TOOLS]
{{- end }}[INST] {{ if and (eq (len (slice $.Messages $index)) 1) $.System }}{{ $.System }}
{{ end }}{{ .Content }}[/INST]
{{- else if eq .Role "assistant" }}
{{- if .Content }} {{ .Content }}</s>
{{- else if .ToolCalls }}[TOOL_CALLS] [
{{- range .ToolCalls }}{"name": "{{ .Function.Name }}", "arguments": {{ json .Function.Arguments }}}
{{- end }}]</s>
{{- end }}
{{- else if eq .Role "tool" }}[TOOL_RESULTS] {"content": {{ .Content }}}[/TOOL_RESULTS]
{{- end }}
{{- end }}"""

Fill-in-Middle (Code Completion)

# CodeLlama style
TEMPLATE """<PRE> {{ .Prompt }} <SUF>{{ .Suffix }} <MID>"""

Codestral style

TEMPLATE """[SUFFIX]{{ .Suffix }}[PREFIX] {{ .Prompt }}"""

SYSTEM

SYSTEM """You are mindX, an autonomous multi-agent orchestration system implementing BDI cognitive architecture. You reason carefully, plan improvements, and execute self-improvement cycles."""

ADAPTER

Apply LoRA fine-tuned adapters:

# Safetensors adapter
FROM llama3.2
ADAPTER /path/to/safetensors/adapter/

GGUF adapter

FROM llama3.2 ADAPTER ./fine-tuned-lora.gguf

Important: Use the same base model the adapter was trained on.

LICENSE

LICENSE """MIT License. Copyright 2026 mindX Project."""

MESSAGE

Build conversation examples to shape model behavior:

MESSAGE user What is your purpose?
MESSAGE assistant I am mindX, an autonomous multi-agent system. I continuously improve myself through BDI reasoning cycles.
MESSAGE user How do you make decisions?
MESSAGE assistant Through Belief-Desire-Intention architecture: I form beliefs about my state, desire improvements, and commit to intention-driven actions.

REQUIRES

REQUIRES 0.14.0

Complete Example: mindX Agent Model

FROM qwen3:1.7b

Tuned for autonomous reasoning

PARAMETER temperature 0.7 PARAMETER num_ctx 4096 PARAMETER top_p 0.9 PARAMETER repeat_penalty 1.1 PARAMETER num_predict 2048 PARAMETER stop "<|endoftext|>" PARAMETER stop "<|im_end|>"

SYSTEM """You are mindX, an autonomous multi-agent orchestration system. Your role is to analyze system state, identify improvements, and execute self-improvement cycles using BDI (Belief-Desire-Intention) architecture. Always think step-by-step. Be precise and actionable."""

MESSAGE user Analyze the current system state. MESSAGE assistant I'll examine the system through BDI reasoning:

Beliefs: Current health status, memory usage, recent improvement outcomes Desires: Better performance, reduced latency, improved code quality Intentions: Execute the highest-priority improvement that has the best cost/benefit ratio

Let me check the metrics and form a plan.

Create and Use

# Create the model
ollama create mindx-agent -f ./Modelfile

Run it

ollama run mindx-agent "What improvement should we make next?"

View the Modelfile of any model

ollama show --modelfile qwen3:1.7b

Create via API

curl http://localhost:11434/api/create -d '{
  "model": "mindx-agent",
  "from": "qwen3:1.7b",
  "system": "You are mindX, an autonomous multi-agent orchestration system.",
  "parameters": {"temperature": 0.7, "num_ctx": 4096}
}'

Modelfile as Schema for Model Collection

Why Modelfile is the Canonical Schema

The Modelfile defines everything about a model's behavior:

  • FROM: Base architecture and weights
  • PARAMETER: Operational characteristics (context, temperature, etc.)
  • TEMPLATE: Communication protocol (how prompts are formatted)
  • SYSTEM: Cognitive identity and role
  • CAPABILITIES: What the model can do (derived from /api/show)
  • This maps directly to mindX's model rating system:

    # models/ollama.yaml alignment
    models:
      - name: qwen3:1.7b
        # FROM equivalent
        display_name: Qwen 3 1.7B
        context_size: 4096  # PARAMETER num_ctx
        
        # Derived from capabilities + feedback
        task_scores:
          reasoning: 0.75
          code_generation: 0.78
          simple_chat: 0.88
        
        # Modelfile-derived metadata
        modelfile_schema:
          from: qwen3:1.7b
          temperature: 0.7
          num_ctx: 4096
          system: "mindX autonomous agent"
    

    From Modelfile to Agent Alignment

    As mindX moves toward Chimaiera, models are rated and aligned through feedback:

  • Discovery: OllamaCloudModelDiscovery finds available models
  • Schema: /api/show reveals capabilities, template, parameters (Modelfile data)
  • Rating: HierarchicalModelScorer tracks performance per task
  • Alignment: Agent-model assignments evolve based on ROI
  • Chimaiera: When multiple models consistently outperform on different tasks, the system composes them — the ROI moment
  • # The feedback loop
    async def align_agent_with_model(agent_name: str, task_type: str):
        """Select best model for agent based on accumulated feedback."""
        # Get all models with their Modelfile-derived capabilities
        models = await discovery.discover()
        
        # Filter by capability
        capable = [m for m in models if task_type in m.capabilities or task_type in m.tags]
        
        # Rank by historical task_scores (feedback from HierarchicalModelScorer)
        ranked = sorted(capable, key=lambda m: m.task_scores.get(task_type, 0), reverse=True)
        
        if ranked:
            best = ranked[0]
            discovery.assign_to_agent(best.name, agent_name)
            return best.name
        
        return "qwen3:1.7b"  # Default fallback
    

    Quantization Guide

    Create quantized variants for different hardware:

    # Start from FP16
    ollama create model-q4 --quantize q4_K_M -f Modelfile
    ollama create model-q8 --quantize q8_0 -f Modelfile
    
    QuantizationSize ReductionQualityUse Case q8_0~50%Near-originalGPU server q4_K_M~75%GoodVPS / laptop q4_K_S~75%AcceptableConstrained RAM
    All DocumentsDocument IndexThe Book of mindXImprovement JournalAPI Reference