Whether mindX is ingesting (receiving data from clients), providing inference (calling Ollama/LLMs), or services (orchestration, memory, tools), monitoring and rate control are essential in both directions. This document defines actual network and data metrics in scientific form (SI or standard units) and where they apply.
llm/rate_limiter.py)Both directions must be measured and, where configured, limited so that ingestion, inference, and services stay within capacity and quotas.
All metrics use explicit units. Prefer SI or widely used standards.
average_latency_ms, latency_ms, total_duration (ns → convert to s or ms).mindx_backend_service/inbound_metrics.py — InboundMetricsMiddleware records per-request latency \(T_{\mathrm{lat}}\) (ms), request body size \(B_{\mathrm{in}}\) (bytes), response body size \(B_{\mathrm{out}}\) (bytes). Optional inbound rate limit (req/min) returns 429 when exceeded.get_metrics(window_s) returns total_requests, total_latency_ms, average_latency_ms, total_request_bytes, total_response_bytes, requests_per_minute (in window), rate_limit_rejects, latency_p50_ms, latency_p90_ms, latency_p99_ms.GET /api/monitoring/inbound — returns inbound_metrics (scientific units) and inbound_rate_limit (requests_per_minute, window_s). Enable limit via set_inbound_rate_limit(requests_per_minute, window_s).api/ollama/ollama_url.py): total_requests, successful_requests, failed_requests, rate_limit_hits, total_tokens, average_latency_ms, rate_limits.rpm, rate_limits.tpm.llm/rate_limiter.py): wait_time_ms, wait_time_p50/p90/p99, token_utilization, requests_per_minute, requests_per_hour; get_metrics() returns these.agents/monitoring/performance_monitor.py): total_calls, successful_calls, failed_calls, total_latency_ms, latencies_ms, total_prompt_tokens, total_completion_tokens, total_cost.llm/rate_limiter.py — RateLimiter.get_metrics(), DualLayerRateLimiter.get_metrics(), HourlyRateLimiter.get_metrics(); api/llm_routes.py — rate limit status and update endpoints.models/*.yaml — rate_limits (rpm, rph) and optional quota (total_calls, period_days) for even distribution.data/config/llm_factory_config.json — rate_limit_profiles (rpm, rph, strict, very_strict, etc.).get_metrics() on limiters and API clients, PerformanceMonitor, and optional inbound middleware; persist or export as needed for dashboards and alerts.