API Reference: Embeddings — POST /api/embed

Generate vector embeddings for RAGE/semantic search and pgvector storage.

Endpoint

POST http://localhost:11434/api/embed
POST https://ollama.com/api/embed  # Cloud (requires OLLAMA_API_KEY)

Request Parameters

ParameterTypeDefaultRequiredDescription modelstring—yesEmbedding model (e.g., mxbai-embed-large, nomic-embed-text) inputstring\string[]—yesText(s) to embed truncatebooleantruenoTruncate inputs exceeding context. false = error on overflow dimensionsinteger—noDesired embedding vector size keep_alivestring"5m"noModel memory duration optionsModelOptions—noRuntime options

Response Fields

FieldTypeDescription modelstringModel that produced embeddings embeddingsfloat[][]Array of embedding vectors (L2-normalized / unit-length) total_durationintegerTotal time (nanoseconds) load_durationintegerModel load time (ns) prompt_eval_countintegerInput tokens processed

Examples

Single Text

curl http://localhost:11434/api/embed -d '{
  "model": "mxbai-embed-large",
  "input": "The quick brown fox jumps over the lazy dog."
}'

Response:

{
  "model": "mxbai-embed-large",
  "embeddings": [[0.010071, -0.001759, 0.050072, ...]],
  "total_duration": 14143917,
  "load_duration": 1019500,
  "prompt_eval_count": 8
}

Batch Embedding (Multiple Texts)

curl http://localhost:11434/api/embed -d '{
  "model": "mxbai-embed-large",
  "input": [
    "First document to embed",
    "Second document to embed",
    "Third document to embed"
  ]
}'

Returns embeddings array with one vector per input text.

With Dimension Control

curl http://localhost:11434/api/embed -d '{
  "model": "mxbai-embed-large",
  "input": "Generate embeddings for this text",
  "dimensions": 128
}'

Disable Truncation (Error on Overflow)

curl http://localhost:11434/api/embed -d '{
  "model": "mxbai-embed-large",
  "input": "Very long text that might exceed context...",
  "truncate": false
}'

Recommended Embedding Models

ModelDimensionsSpeedBest For mxbai-embed-large1024MediumHigh-quality semantic search nomic-embed-text768FastGeneral embeddings, smaller footprint embeddinggemma768MediumGoogle's embedding model qwen3-embedding1024MediumMultilingual embeddings all-minilm384Very fastLightweight, fast retrieval

Tips

Use cosine similarity for most semantic search use cases

Always use the same model for both indexing and querying

Embeddings are L2-normalized (unit-length) — cosine similarity = dot product

Batch embedding is more efficient than individual calls

mindX Integration

mindX currently uses mxbai-embed-large and nomic-embed-text for RAGE (not RAG) semantic search with pgvector.

# Direct embedding via aiohttp (extend OllamaAPI)
import aiohttp, json
async def embed_texts(texts: list[str], model: str = "mxbai-embed-large") -> list[list[float]]:
    """Generate embeddings via Ollama for pgvector storage."""
    async with aiohttp.ClientSession() as session:
        payload = {"model": model, "input": texts}
        async with session.post(
            "http://localhost:11434/api/embed",
            json=payload,
            timeout=aiohttp.ClientTimeout(total=60)
        ) as resp:
            data = await resp.json()
            return data["embeddings"]
Usage with pgvector
embeddings = await embed_texts(["mindX autonomous improvement", "BDI reasoning engine"])
Store in pgvector: INSERT INTO memories (content, embedding) VALUES ($1, $2)

All Documents Document Index The Book of mindX Improvement Journal API Reference