API Reference: Generate — POST /api/generate

Generate text completions. For conversations, prefer /api/chat.

Endpoint

POST http://localhost:11434/api/generate
POST https://ollama.com/api/generate  # Cloud (requires OLLAMA_API_KEY)

Request Parameters

ParameterTypeDefaultRequiredDescription modelstring—yesModel name (e.g., qwen3:1.7b) promptstring—noText prompt for generation suffixstring—noFill-in-the-middle: text after the model response imagesstring[]—noBase64-encoded images for vision models formatstring\object—no"json" or a JSON schema object systemstring—noSystem prompt override streambooleantruenoStream partial responses thinkboolean\string—noEnable thinking trace. true/false or "high"/"medium"/"low" (GPT-OSS) rawboolean—noSkip prompt templating keep_alivestring\number"5m"noHow long to keep model loaded. "10m", 3600, 0 (unload), -1 (forever) optionsModelOptions—noRuntime generation controls logprobsboolean—noReturn token log probabilities top_logprobsinteger—noNumber of likely tokens per position

ModelOptions

ParameterTypeDefaultDescription seedinteger0Random seed for reproducibility temperaturefloat0.8Randomness (0.0 = deterministic, 2.0 = very random) top_kinteger40Limit next token to K most likely top_pfloat0.9Nucleus sampling threshold min_pfloat0.0Minimum probability threshold stopstring\string[]—Stop sequences num_ctxinteger2048Context window size in tokens num_predictinteger-1Max tokens to generate (-1 = unlimited)

Response Fields

FieldTypeDescription modelstringModel name created_atstringISO 8601 timestamp responsestringGenerated text thinkingstringReasoning trace (when think enabled) donebooleanGeneration complete done_reasonstringWhy generation stopped ("stop", "length") total_durationintegerTotal time in nanoseconds load_durationintegerModel loading time (ns) prompt_eval_countintegerInput token count prompt_eval_durationintegerPrompt evaluation time (ns) eval_countintegerOutput token count eval_durationintegerToken generation time (ns) logprobsLogprob[]Token probability data (when enabled)

Performance Calculation

tokens_per_second = eval_count / eval_duration  1e9

Examples

Basic Generation

curl http://localhost:11434/api/generate -d '{ "model": "qwen3:1.7b", "prompt": "Why is the sky blue?", "stream": false }'

Streaming (default)

curl http://localhost:11434/api/generate -d '{ "model": "qwen3:1.7b", "prompt": "Why is the sky blue?" }' Returns newline-delimited JSON chunks

With Options

curl http://localhost:11434/api/generate -d '{ "model": "qwen3:1.7b", "prompt": "Write a haiku about AI", "stream": false, "options": { "temperature": 0.3, "top_p": 0.9, "seed": 42, "num_ctx": 4096 } }'

Structured Output (JSON Schema)

curl http://localhost:11434/api/generate -d '{ "model": "qwen3:1.7b", "prompt": "What are the populations of the US and Canada?", "stream": false, "format": { "type": "object", "properties": { "countries": { "type": "array", "items": { "type": "object", "properties": { "country": {"type": "string"}, "population": {"type": "integer"} }, "required": ["country", "population"] } } }, "required": ["countries"] } }'

With Thinking

curl http://localhost:11434/api/generate -d '{ "model": "deepseek-r1:1.5b", "prompt": "How many r letters in strawberry?", "think": true, "stream": false }'

With Images (Vision)

curl http://localhost:11434/api/generate -d '{ "model": "gemma3", "prompt": "What is in this picture?", "images": ["iVBORw0KGgoAAAANSUhEUg..."], "stream": false }'

Fill-in-the-Middle (Code Completion)

curl http://localhost:11434/api/generate -d '{ "model": "qwen2.5-coder:1.5b", "prompt": "def compute_gcd(a, b):", "suffix": " return result", "stream": false }'

Preload Model (Empty Request)

curl http://localhost:11434/api/generate -d '{"model": "qwen3:1.7b"}'

Unload Model

curl http://localhost:11434/api/generate -d '{"model": "qwen3:1.7b", "keep_alive": 0}'

Keep Model Loaded Indefinitely

curl http://localhost:11434/api/generate -d '{"model": "qwen3:1.7b", "keep_alive": -1}'

mindX Integration

# Via OllamaAPI (api/ollama/ollama_url.py)
result = await ollama_api.generate_text(
    prompt="Why is the sky blue?",
    model="qwen3:1.7b",
    max_tokens=200,
    temperature=0.7,
    keep_alive="10m"
)
With structured output
result = await ollama_api.generate_text(
    prompt="Describe the weather",
    model="qwen3:1.7b",
    format={"type": "object", "properties": {"temp": {"type": "integer"}}},
)
With thinking
result = await ollama_api.generate_text(
    prompt="Solve this step by step: 17  23",
    model="deepseek-r1:1.5b",
    think=True
)

All Documents Document Index The Book of mindX Improvement Journal API Reference