Thinking-capable models emit a thinking field separating reasoning trace from final answer.
Use this to: audit model steps, animate "thinking" in UI, or hide the trace when you only need the final response.
think: true/falsethink: "low"/"medium"/"high" (cannot fully disable)think: true/falsethink: true/falseNote: Thinking is enabled by default in CLI and API for supported models.
curl http://localhost:11434/api/chat -d '{
"model": "deepseek-r1:1.5b",
"messages": [{"role": "user", "content": "How many r letters in strawberry?"}],
"think": true,
"stream": false
}'
Response:
{
"message": {
"role": "assistant",
"thinking": "Let me count: s-t-r-a-w-b-e-r-r-y. The letter r appears at positions 3, 8, 9. So 3 times.",
"content": "There are 3 letter r's in 'strawberry'."
}
}
curl http://localhost:11434/api/chat -d '{
"model": "gpt-oss",
"messages": [{"role": "user", "content": "What is 1+1?"}],
"think": "low"
}'
Note: GPT-OSS ignores true/false — must use "low", "medium", or "high".
from ollama import chat
Non-streaming
response = chat(
model='deepseek-r1:1.5b',
messages=[{'role': 'user', 'content': 'How many r letters in strawberry?'}],
think=True,
stream=False,
)
print('Thinking:\n', response.message.thinking)
print('Answer:\n', response.message.content)
Streaming with thinking
stream = chat(
model='qwen3',
messages=[{'role': 'user', 'content': 'What is 17 x 23?'}],
think=True,
stream=True,
)
in_thinking = False
for chunk in stream:
if chunk.message.thinking and not in_thinking:
in_thinking = True
print('Thinking:\n', end='')
if chunk.message.thinking:
print(chunk.message.thinking, end='')
elif chunk.message.content:
if in_thinking:
print('\n\nAnswer:\n', end='')
in_thinking = False
print(chunk.message.content, end='')
import ollama from 'ollama'
// Non-streaming
const response = await ollama.chat({
model: 'deepseek-r1',
messages: [{ role: 'user', content: 'How many r letters in strawberry?' }],
think: true,
stream: false,
})
console.log('Thinking:\n', response.message.thinking)
console.log('Answer:\n', response.message.content)
# Enable thinking for a single run
ollama run deepseek-r1 --think "Where should I visit in Lisbon?"
Disable thinking
ollama run deepseek-r1 --think=false "Summarize this article"
Hide trace while still using thinking model
ollama run deepseek-r1 --hidethinking "Is 9.9 bigger or 9.11?"
Interactive session toggle
/set think
/set nothink
GPT-OSS levels
ollama run gpt-oss --think=low "Draft a headline"
ollama run gpt-oss --think=medium "Analyze this code"
ollama run gpt-oss --think=high "Prove this theorem"
The existing OllamaAPI.generate_text() already supports think via kwargs:
# Already works via api/ollama/ollama_url.py line:
for key in ["format", "system", "template", "raw", "suffix", "images", "think"]:
if key in kwargs:
payload[key] = kwargs[key]
result = await ollama_api.generate_text(
prompt="Analyze the autonomous improvement cycle for inefficiencies",
model="deepseek-r1:1.5b",
think=True # Passes through to payload
)
The response includes thinking trace in the response text
For chat endpoint, thinking is in message.thinking field
Different agents benefit from different thinking levels:
think=True — full reasoning for improvement decisionsthink="high" (GPT-OSS cloud) — deep reasoning for evolution plansthink=False — speed over reasoningthink=True with deepseek-r1:1.5b — chain-of-thought