mindxagent_ollama_monitoring.md · 6.0 KB

mindXagent Ollama Connection Monitoring

Overview

This document describes the comprehensive monitoring and error handling system for mindXagent's Ollama connection, ensuring accurate, error-free network operations and efficient network sanity.

Monitoring System

Connection Monitor Script

Location: scripts/test_mindxagent_ollama_connection_monitor.py

The connection monitor provides:

Features

  1. Periodic Health Checks
- Configurable check interval (default: 10 seconds) - Connection status validation - Model availability verification - Response time measurement

  1. Error Tracking
- Categorizes errors by type - Tracks error frequency and patterns - Logs errors to memory agent for persistence - Provides detailed error reports

  1. Network Sanity Validation
- Validates connection status - Checks model availability - Monitors error rates - Tracks latency metrics - Validates inference optimizer status

  1. Automatic Recovery
- Exponential backoff retry logic - Connection reinitialization - Model rediscovery on failures

Enhanced Error Handling

Ollama Chat Manager Improvements

Location: agents/core/ollama_chat_manager.py

Connection Retry Logic

Enhanced Error Logging

Health Check Method

New check_health() method provides:

health = await ollama_chat_manager.check_health()

Returns comprehensive health status including:

Usage

Running the Connection Monitor

# Run with default settings (5 minutes, 10 second intervals)
python3 scripts/test_mindxagent_ollama_connection_monitor.py

Monitor for custom duration

Edit MONITOR_DURATION and CHECK_INTERVAL in the script

Integration with mindXagent

The monitoring system is automatically integrated:

  1. Initialization: Connection health is checked during mindXagent initialization
  2. Runtime Monitoring: Errors are logged and tracked during operation
  3. Automatic Recovery: Connection failures trigger automatic reconnection attempts
  4. Health Reporting: Health status is available via API endpoints

Error Categories

Connection Errors

Application Errors

Network Sanity Checks

The system performs comprehensive sanity checks:

  1. Connection Status: Verifies active connection to Ollama server
  2. Model Availability: Ensures at least one model is available
  3. Error Rate: Monitors error rate (alerts if > 10%)
  4. Latency: Tracks average latency (alerts if > 10 seconds)
  5. Optimizer Status: Validates inference optimizer functionality

Metrics and Reporting

Tracked Metrics

Reporting

The monitor provides:

Best Practices

Configuration

- Recommended: 10-30 seconds for production - Lower intervals for critical systems - Higher intervals for background monitoring

- Short tests: 1-5 minutes - Extended monitoring: 30+ minutes - Continuous monitoring: Run as background service

Error Handling

Integration with AGLM

The monitoring system integrates with AGLM (a General Learning Model) framework:

Related Documentation


Last Updated: 2026-01-17 Maintained By: mindX Documentation System


Referenced in this document
aglminference_optimization

All DocumentsDocument IndexThe Book of mindXImprovement JournalAPI Reference