ollama/api/generate.md · 5.6 KB

API Reference: Generate — POST /api/generate

Generate text completions. For conversations, prefer /api/chat.

Endpoint

POST http://localhost:11434/api/generate
POST https://ollama.com/api/generate  # Cloud (requires OLLAMA_API_KEY)

Request Parameters

ParameterTypeDefaultRequiredDescription modelstring—yesModel name (e.g., qwen3:1.7b) promptstring—noText prompt for generation suffixstring—noFill-in-the-middle: text after the model response imagesstring[]—noBase64-encoded images for vision models formatstring\object—no"json" or a JSON schema object systemstring—noSystem prompt override streambooleantruenoStream partial responses thinkboolean\string—noEnable thinking trace. true/false or "high"/"medium"/"low" (GPT-OSS) rawboolean—noSkip prompt templating keep_alivestring\number"5m"noHow long to keep model loaded. "10m", 3600, 0 (unload), -1 (forever) optionsModelOptions—noRuntime generation controls logprobsboolean—noReturn token log probabilities top_logprobsinteger—noNumber of likely tokens per position

ModelOptions

ParameterTypeDefaultDescription seedinteger0Random seed for reproducibility temperaturefloat0.8Randomness (0.0 = deterministic, 2.0 = very random) top_kinteger40Limit next token to K most likely top_pfloat0.9Nucleus sampling threshold min_pfloat0.0Minimum probability threshold stopstring\string[]—Stop sequences num_ctxinteger2048Context window size in tokens num_predictinteger-1Max tokens to generate (-1 = unlimited)

Response Fields

FieldTypeDescription modelstringModel name created_atstringISO 8601 timestamp responsestringGenerated text thinkingstringReasoning trace (when think enabled) donebooleanGeneration complete done_reasonstringWhy generation stopped ("stop", "length") total_durationintegerTotal time in nanoseconds load_durationintegerModel loading time (ns) prompt_eval_countintegerInput token count prompt_eval_durationintegerPrompt evaluation time (ns) eval_countintegerOutput token count eval_durationintegerToken generation time (ns) logprobsLogprob[]Token probability data (when enabled)

Performance Calculation

tokens_per_second = eval_count / eval_duration  1e9

Examples

Basic Generation

curl http://localhost:11434/api/generate -d '{
  "model": "qwen3:1.7b",
  "prompt": "Why is the sky blue?",
  "stream": false
}'

Streaming (default)

curl http://localhost:11434/api/generate -d '{
  "model": "qwen3:1.7b",
  "prompt": "Why is the sky blue?"
}'

Returns newline-delimited JSON chunks

With Options

curl http://localhost:11434/api/generate -d '{
  "model": "qwen3:1.7b",
  "prompt": "Write a haiku about AI",
  "stream": false,
  "options": {
    "temperature": 0.3,
    "top_p": 0.9,
    "seed": 42,
    "num_ctx": 4096
  }
}'

Structured Output (JSON Schema)

curl http://localhost:11434/api/generate -d '{
  "model": "qwen3:1.7b",
  "prompt": "What are the populations of the US and Canada?",
  "stream": false,
  "format": {
    "type": "object",
    "properties": {
      "countries": {
        "type": "array",
        "items": {
          "type": "object",
          "properties": {
            "country": {"type": "string"},
            "population": {"type": "integer"}
          },
          "required": ["country", "population"]
        }
      }
    },
    "required": ["countries"]
  }
}'

With Thinking

curl http://localhost:11434/api/generate -d '{
  "model": "deepseek-r1:1.5b",
  "prompt": "How many r letters in strawberry?",
  "think": true,
  "stream": false
}'

With Images (Vision)

curl http://localhost:11434/api/generate -d '{
  "model": "gemma3",
  "prompt": "What is in this picture?",
  "images": ["iVBORw0KGgoAAAANSUhEUg..."],
  "stream": false
}'

Fill-in-the-Middle (Code Completion)

curl http://localhost:11434/api/generate -d '{
  "model": "qwen2.5-coder:1.5b",
  "prompt": "def compute_gcd(a, b):",
  "suffix": "    return result",
  "stream": false
}'

Preload Model (Empty Request)

curl http://localhost:11434/api/generate -d '{"model": "qwen3:1.7b"}'

Unload Model

curl http://localhost:11434/api/generate -d '{"model": "qwen3:1.7b", "keep_alive": 0}'

Keep Model Loaded Indefinitely

curl http://localhost:11434/api/generate -d '{"model": "qwen3:1.7b", "keep_alive": -1}'

mindX Integration

# Via OllamaAPI (api/ollama/ollama_url.py)
result = await ollama_api.generate_text(
    prompt="Why is the sky blue?",
    model="qwen3:1.7b",
    max_tokens=200,
    temperature=0.7,
    keep_alive="10m"
)

With structured output

result = await ollama_api.generate_text( prompt="Describe the weather", model="qwen3:1.7b", format={"type": "object", "properties": {"temp": {"type": "integer"}}}, )

With thinking

result = await ollama_api.generate_text( prompt="Solve this step by step: 17
23", model="deepseek-r1:1.5b", think=True )

All DocumentsDocument IndexThe Book of mindXImprovement JournalAPI Reference