RAGE (Retrieval Augmented Generative Engine) embed is the embedding layer of mindX. It bridges LLM inference with pgvector database storage, enabling semantic search over all documentation and agent memories.
mindX embeds all documentation (194 files) and agent memories into pgvector using mxbai-embed-large (1024 dimensions). RAGE facilitates the interaction between LLM and pgvectorscale, providing the semantic retrieval layer for RAG (Retrieval Augmented Generation) queries.
Question → Embed (mxbai-embed-large) → pgvector cosine similarity → Top-K chunks → qwen3:0.6b answer
/api/embeddings on port 11434 — CPU-native, always running/v1/embeddings on port 8001 — auto-activates when GPU hardware availabledoc_embeddings and memories.embedding columns-- Document chunks with embeddings
doc_embeddings (
doc_name VARCHAR(256),
chunk_idx INTEGER,
text_content TEXT,
embedding vector(1024),
UNIQUE(doc_name, chunk_idx)
)
-- Memory embeddings (column on existing memories table)
memories.embedding vector(1024)
Documents are split into ~500-word chunks. Each chunk is embedded independently. A 10KB doc typically produces 3-5 chunks. This ensures that search results return specific, relevant passages rather than entire documents.
/api/rage/embed?query=.../api/rage/embed/stats/actions/export/actions/export/csv/actions/efficiency/diagnostics/exportAsk a question about mindX documentation. Returns an answer generated from relevant doc chunks.
curl -X POST "https://mindx.pythai.net/chat/docs?question=What+is+the+BDI+agent"
Response:
{
"question": "What is the BDI agent",
"answer": "The BDI (Belief-Desire-Intention) agent is the core reasoning engine...",
"sources": [
{"doc": "AGINT", "similarity": 0.6226},
{"doc": "AGENTS", "similarity": 0.6207}
]
}
curl https://mindx.pythai.net/chat/docs/stats
Response:
{"docs": 95, "memories": 558}
# Embed all docs and memories
python scripts/embed_docs.py
This walks docs/*.md, chunks each, embeds via mxbai-embed-large, and stores in pgvector. Also embeds all memories without embeddings. Takes ~5 minutes on CPU.
New memories are auto-embedded on save via MemoryAgent.save_timestamped_memory(). New docs are picked up every 6 hours by the periodic re-embedding task.
agents/memory_pgvector.py (generate_embedding, embed_and_store_doc, semantic_search_docs)scripts/embed_docs.pyllm/vllm_handler.py (generate_embeddings method)scripts/start_vllm_embed.shmindx_backend_service/main_service.py (/chat/docs)agents/vllm_agent.py manages the vLLM lifecycle for mindX:
GET /vllm/statusPOST /vllm/build-cpuPOST /vllm/servePOST /vllm/stopGET /vllm/healthPOST /vllm/serve)# Environment variables
VLLM_EMBED_URL=http://localhost:8001 # vLLM embedding server
VLLM_PORT=8001 # vLLM serving port
EMBED_MODEL=mxbai-embed-large # Default embedding model
Ollama models (pull if not present)
ollama pull mxbai-embed-large
ollama pull nomic-embed-text
ollama pull qwen3:0.6b