API Reference: Chat — POST /api/chat

Conversational completions with message history, tool calling, thinking, vision.

Endpoint

POST http://localhost:11434/api/chat
POST https://ollama.com/api/chat  # Cloud (requires OLLAMA_API_KEY)

Request Parameters

ParameterTypeDefaultRequiredDescription modelstring—yesModel name messagesChatMessage[]—yesConversation history toolsToolDefinition[]—noFunction tools the model may invoke formatstring\object—no"json" or JSON schema streambooleantruenoStream partial responses thinkboolean\string—noEnable thinking. true/false or "high"/"medium"/"low" keep_alivestring\number"5m"noModel memory duration optionsModelOptions—notemperature, top_k, top_p, etc. logprobsboolean—noReturn token log probabilities top_logprobsinteger—noMost likely tokens per position

ChatMessage

FieldTypeRequiredDescription rolestringyes"system", "user", "assistant", or "tool" contentstringyesMessage text imagesstring[]noBase64-encoded images (vision models) tool_callsToolCall[]noTool invocations (assistant messages)

ToolDefinition

{
  "type": "function",
  "function": {
    "name": "get_temperature",
    "description": "Get the current temperature for a city",
    "parameters": {
      "type": "object",
      "required": ["city"],
      "properties": {
        "city": {"type": "string", "description": "City name"}
      }
    }
  }
}

ToolCall (in response)

{
  "function": {
    "name": "get_temperature",
    "arguments": {"city": "New York"}
  }
}

Response Fields

FieldTypeDescription modelstringModel name created_atstringISO 8601 timestamp messageobject{role, content, thinking, tool_calls, images} donebooleanGeneration complete done_reasonstringTermination cause total_durationintegerTotal time (nanoseconds) load_durationintegerModel load time (ns) prompt_eval_countintegerInput tokens prompt_eval_durationintegerPrompt eval time (ns) eval_countintegerOutput tokens eval_durationintegerGeneration time (ns)

Examples

Basic Chat

curl http://localhost:11434/api/chat -d '{
  "model": "qwen3:1.7b",
  "messages": [
    {"role": "user", "content": "Why is the sky blue?"}
  ],
  "stream": false
}'

Multi-Turn Conversation

curl http://localhost:11434/api/chat -d '{
  "model": "qwen3:1.7b",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is Rayleigh scattering?"},
    {"role": "assistant", "content": "Rayleigh scattering is the scattering of light by particles smaller than the wavelength of radiation."},
    {"role": "user", "content": "How does that make the sky blue?"}
  ],
  "stream": false
}'

Structured Output with Schema

curl http://localhost:11434/api/chat -d '{
  "model": "qwen3:1.7b",
  "messages": [{"role": "user", "content": "Tell me about Canada."}],
  "stream": false,
  "format": {
    "type": "object",
    "properties": {
      "name": {"type": "string"},
      "capital": {"type": "string"},
      "languages": {"type": "array", "items": {"type": "string"}}
    },
    "required": ["name", "capital", "languages"]
  }
}'

Tool Calling — Single Tool

# Step 1: Model requests tool call
curl http://localhost:11434/api/chat -d '{
  "model": "qwen3:1.7b",
  "messages": [{"role": "user", "content": "What is the temperature in New York?"}],
  "stream": false,
  "tools": [{
    "type": "function",
    "function": {
      "name": "get_temperature",
      "description": "Get the current temperature for a city",
      "parameters": {
        "type": "object",
        "required": ["city"],
        "properties": {
          "city": {"type": "string", "description": "City name"}
        }
      }
    }
  }]
}'
Step 2: Send tool result back
curl http://localhost:11434/api/chat -d '{
  "model": "qwen3:1.7b",
  "messages": [
    {"role": "user", "content": "What is the temperature in New York?"},
    {"role": "assistant", "tool_calls": [{"type": "function", "function": {"index": 0, "name": "get_temperature", "arguments": {"city": "New York"}}}]},
    {"role": "tool", "tool_name": "get_temperature", "content": "22C"}
  ],
  "stream": false
}'

Thinking

curl http://localhost:11434/api/chat -d '{
  "model": "deepseek-r1:1.5b",
  "messages": [{"role": "user", "content": "How many r letters in strawberry?"}],
  "think": true,
  "stream": false
}'

Response includes message.thinking (reasoning trace) and message.content (final answer).

Vision (Image Input)

curl http://localhost:11434/api/chat -d '{
  "model": "gemma3",
  "messages": [{
    "role": "user",
    "content": "What is in this image?",
    "images": ["iVBORw0KGgoAAAANSUhEUg...base64..."]
  }],
  "stream": false
}'

Preload / Unload Model

# Preload
curl http://localhost:11434/api/chat -d '{"model": "qwen3:1.7b"}'
Unload immediately
curl http://localhost:11434/api/chat -d '{"model": "qwen3:1.7b", "keep_alive": 0}'
Keep loaded forever
curl http://localhost:11434/api/chat -d '{"model": "qwen3:1.7b", "keep_alive": -1}'

Streaming Response Format

Each chunk is newline-delimited JSON:

{"model":"qwen3:1.7b","created_at":"...","message":{"role":"assistant","content":"The"},"done":false}
{"model":"qwen3:1.7b","created_at":"...","message":{"role":"assistant","content":" sky"},"done":false}
{"model":"qwen3:1.7b","created_at":"...","message":{"role":"assistant","content":""},"done":true,"done_reason":"stop","total_duration":...,"eval_count":42}

The final chunk (done: true) includes performance metrics.

mindX Integration

# Via OllamaAPI (api/ollama/ollama_url.py) — uses /api/chat when use_chat=True
result = await ollama_api.generate_text(
    prompt="What is the weather?",
    model="qwen3:1.7b",
    use_chat=True,
    messages=[
        {"role": "system", "content": "You are mindX, an autonomous AI."},
        {"role": "user", "content": "What is the weather?"}
    ]
)
Via OllamaChatManager (agents/core/ollama_chat_manager.py)
response = await chat_manager.chat(
    message="Analyze the latest improvement cycle",
    model="qwen3:1.7b",
    system_prompt="You are mindX's autonomous improvement agent."
)

All Documents Document Index The Book of mindX Improvement Journal API Reference