rate_limiting_optimization.md · 5.6 KB

Rate Limiting & Autonomous Interaction Optimizations

Overview

This document describes the optimizations made to rate limiting and autonomous interaction limits to improve mindX performance and throughput.

Changes Made

1. Rate Limiting Configuration

File: data/config/llm_factory_config.json (newly created)

Previous: Default rate limit was 2 requests per minute (hardcoded in llm_factory.py)

New Configuration:

{
  "rate_limit_profiles": {
    "default_rpm": 60,        // 30x increase from 2 to 60 requests/min
    "high_throughput": 120,   // For high-performance scenarios
    "conservative": 30,       // For resource-constrained environments
    "ollama_local": 1000,     // High limit for local Ollama servers
    "mistral": 50,            // Provider-specific limits
    "gemini": 40
  }
}

Impact:

  • 30x increase in default rate limit (2 → 60 requests/min)
  • Provider-specific rate limits for optimal performance
  • Configurable profiles for different use cases
  • Ollama local server can handle up to 1000 requests/min
  • 2. MindXAgent Concurrent Improvements

    File: agents/core/mindXagent.py

    Previous: max_concurrent_improvements: 1

    New: max_concurrent_improvements: 5

    Impact:

  • 5x increase in parallel improvement operations
  • Allows mindXagent to work on multiple improvements simultaneously
  • Better utilization of system resources
  • Faster overall improvement cycles
  • 3. Coordinator Heavy Tasks Concurrency

    File: data/config/mindx_config.json

    Previous: max_concurrent_heavy_tasks: 2 (default)

    New: max_concurrent_heavy_tasks: 5

    Impact:

  • 2.5x increase in concurrent heavy task execution
  • Better handling of resource-intensive operations
  • Improved system throughput for component improvements
  • More efficient use of available system resources
  • Performance Improvements

    Before Optimization

  • Rate Limit: 2 requests/min
  • Concurrent Improvements: 1
  • Heavy Tasks: 2 concurrent
  • Estimated Throughput: ~2 operations/min
  • After Optimization

  • Rate Limit: 60 requests/min (default)
  • Concurrent Improvements: 5
  • Heavy Tasks: 5 concurrent
  • Estimated Throughput: ~60 operations/min (30x improvement)
  • Configuration Profiles

    Default Profile (default_rpm)

  • Rate: 60 requests/min
  • Use Case: Standard mindX operations
  • Best For: General autonomous improvement cycles
  • High Throughput Profile (high_throughput)

  • Rate: 120 requests/min
  • Use Case: Intensive improvement campaigns
  • Best For: Large-scale system optimization
  • Conservative Profile (conservative)

  • Rate: 30 requests/min
  • Use Case: Resource-constrained environments
  • Best For: Systems with limited API quotas
  • Ollama Local Profile (ollama_local)

  • Rate: 1000 requests/min
  • Use Case: Local Ollama server inference
  • Best For: High-frequency local model interactions
  • Usage

    Using Rate Limit Profiles

    When creating LLM handlers, specify the rate limit profile:

    # Default profile (60 requests/min)
    handler = await create_llm_handler(
        provider_name="mistral",
        rate_limit_profile="default_rpm"
    )

    High throughput profile (120 requests/min)

    handler = await create_llm_handler( provider_name="mistral", rate_limit_profile="high_throughput" )

    Ollama local profile (1000 requests/min)

    handler = await create_llm_handler( provider_name="ollama", rate_limit_profile="ollama_local" )

    Adjusting Concurrent Limits

    MindXAgent Settings (in code):

    mindx_agent.settings["max_concurrent_improvements"] = 5
    

    Coordinator Settings (in mindx_config.json):

    {
      "coordinator": {
        "max_concurrent_heavy_tasks": 5
      }
    }
    

    Monitoring (both directions)

    Whether mindX is ingesting, providing inference, or services, monitoring and rate control are essential in both directions (inbound and outbound). See docs/monitoring_rate_control.md for scientific network and data metrics (latency ms, bytes, req/min). Inbound: GET /api/monitoring/inbound; outbound: rate limiter and provider get_metrics().

    Monitor the following metrics to ensure optimal performance:

  • Rate Limit Hits: Check if rate limits are being hit frequently
  • Concurrent Task Queue: Monitor queue depth for heavy tasks
  • Improvement Cycle Duration: Track time for improvement cycles
  • System Resource Usage: Monitor CPU, memory, and API usage
  • Recommendations

  • Start with Defaults: Use default settings (60 RPM, 5 concurrent) for most scenarios
  • Scale Up Gradually: Increase limits if system can handle more load
  • Monitor API Quotas: Ensure provider API quotas support higher rates
  • Adjust for Environment: Use conservative profile for limited resources
  • Use Ollama Local: For local inference, use ollama_local profile for maximum throughput
  • Rollback

    If issues occur, you can rollback by:

  • Rate Limiting: Edit data/config/llm_factory_config.json and set default_rpm to 2
  • Concurrent Improvements: Edit agents/core/mindXagent.py and set max_concurrent_improvements to 1
  • Heavy Tasks: Edit data/config/mindx_config.json and set max_concurrent_heavy_tasks to 2
  • Future Enhancements

  • Dynamic Rate Limiting: Adjust rates based on system load
  • Adaptive Concurrency: Automatically adjust concurrent limits
  • Provider-Specific Optimization: Fine-tune limits per provider
  • Cost-Aware Rate Limiting: Consider API costs in rate decisions
  • Performance-Based Profiles: Auto-select profiles based on performance metrics

  • Referenced in this document
    monitoring_rate_control

    All DocumentsDocument IndexThe Book of mindXImprovement JournalAPI Reference