rate_limiting_optimization.md · 5.6 KB
Rate Limiting & Autonomous Interaction Optimizations
Overview
This document describes the optimizations made to rate limiting and autonomous interaction limits to improve mindX performance and throughput.
Changes Made
1. Rate Limiting Configuration
File: data/config/llm_factory_config.json (newly created)
Previous: Default rate limit was 2 requests per minute (hardcoded in llm_factory.py)
New Configuration:
{
"rate_limit_profiles": {
"default_rpm": 60, // 30x increase from 2 to 60 requests/min
"high_throughput": 120, // For high-performance scenarios
"conservative": 30, // For resource-constrained environments
"ollama_local": 1000, // High limit for local Ollama servers
"mistral": 50, // Provider-specific limits
"gemini": 40
}
}
Impact:
- 30x increase in default rate limit (2 → 60 requests/min)
- Provider-specific rate limits for optimal performance
- Configurable profiles for different use cases
- Ollama local server can handle up to 1000 requests/min
2. MindXAgent Concurrent Improvements
File: agents/core/mindXagent.py
Previous: max_concurrent_improvements: 1
New: max_concurrent_improvements: 5
Impact:
- 5x increase in parallel improvement operations
- Allows mindXagent to work on multiple improvements simultaneously
- Better utilization of system resources
- Faster overall improvement cycles
3. Coordinator Heavy Tasks Concurrency
File: data/config/mindx_config.json
Previous: max_concurrent_heavy_tasks: 2 (default)
New: max_concurrent_heavy_tasks: 5
Impact:
- 2.5x increase in concurrent heavy task execution
- Better handling of resource-intensive operations
- Improved system throughput for component improvements
- More efficient use of available system resources
Performance Improvements
Before Optimization
- Rate Limit: 2 requests/min
- Concurrent Improvements: 1
- Heavy Tasks: 2 concurrent
- Estimated Throughput: ~2 operations/min
After Optimization
- Rate Limit: 60 requests/min (default)
- Concurrent Improvements: 5
- Heavy Tasks: 5 concurrent
- Estimated Throughput: ~60 operations/min (30x improvement)
Configuration Profiles
Default Profile (default_rpm)
- Rate: 60 requests/min
- Use Case: Standard mindX operations
- Best For: General autonomous improvement cycles
High Throughput Profile (high_throughput)
- Rate: 120 requests/min
- Use Case: Intensive improvement campaigns
- Best For: Large-scale system optimization
Conservative Profile (conservative)
- Rate: 30 requests/min
- Use Case: Resource-constrained environments
- Best For: Systems with limited API quotas
Ollama Local Profile (ollama_local)
- Rate: 1000 requests/min
- Use Case: Local Ollama server inference
- Best For: High-frequency local model interactions
Usage
Using Rate Limit Profiles
When creating LLM handlers, specify the rate limit profile:
# Default profile (60 requests/min)
handler = await create_llm_handler(
provider_name="mistral",
rate_limit_profile="default_rpm"
)
High throughput profile (120 requests/min)
handler = await create_llm_handler(
provider_name="mistral",
rate_limit_profile="high_throughput"
)
Ollama local profile (1000 requests/min)
handler = await create_llm_handler(
provider_name="ollama",
rate_limit_profile="ollama_local"
)
Adjusting Concurrent Limits
MindXAgent Settings (in code):
mindx_agent.settings["max_concurrent_improvements"] = 5
Coordinator Settings (in mindx_config.json):
{
"coordinator": {
"max_concurrent_heavy_tasks": 5
}
}
Monitoring (both directions)
Whether mindX is ingesting, providing inference, or services, monitoring and rate control are essential in both directions (inbound and outbound). See docs/monitoring_rate_control.md for scientific network and data metrics (latency ms, bytes, req/min). Inbound: GET /api/monitoring/inbound; outbound: rate limiter and provider get_metrics().
Monitor the following metrics to ensure optimal performance:
- Rate Limit Hits: Check if rate limits are being hit frequently
- Concurrent Task Queue: Monitor queue depth for heavy tasks
- Improvement Cycle Duration: Track time for improvement cycles
- System Resource Usage: Monitor CPU, memory, and API usage
Recommendations
- Start with Defaults: Use default settings (60 RPM, 5 concurrent) for most scenarios
- Scale Up Gradually: Increase limits if system can handle more load
- Monitor API Quotas: Ensure provider API quotas support higher rates
- Adjust for Environment: Use conservative profile for limited resources
- Use Ollama Local: For local inference, use
ollama_local profile for maximum throughput
Rollback
If issues occur, you can rollback by:
- Rate Limiting: Edit
data/config/llm_factory_config.json and set default_rpm to 2
- Concurrent Improvements: Edit
agents/core/mindXagent.py and set max_concurrent_improvements to 1
- Heavy Tasks: Edit
data/config/mindx_config.json and set max_concurrent_heavy_tasks to 2
Future Enhancements
- Dynamic Rate Limiting: Adjust rates based on system load
- Adaptive Concurrency: Automatically adjust concurrent limits
- Provider-Specific Optimization: Fine-tune limits per provider
- Cost-Aware Rate Limiting: Consider API costs in rate decisions
- Performance-Based Profiles: Auto-select profiles based on performance metrics
Referenced in this document