File: mindx_backend_service/insight_aggregator.py
Surface: every /insight/ endpoint (and through them, /feedback.html, /feedback.txt, and the landing dashboard's Self-Improvement Ledger row).
The aggregator is the single async loop that turns mindX's append-only logs and per-agent files into the cached numerical surface the operator sees. Nothing else owns these numbers. If something on the page reads wrong, it's wrong here.
main_service.startup_event.interval_s = 60)./insight/ endpoints return cached snapshots, never recompute inline.data/fitness/daily_snapshots.jsonl for trajectory queries.| Source | Path | Purpose |
|---|---|---|
| Mastermind campaign history | data/memory/agent_workspaces/mastermind_prime/mastermind_campaigns_history.json | source for ledger / improvement summary buckets |
| Per-agent process traces | data/memory/agent_workspaces/{agent_id}/process_trace.jsonl | trace-reliability, latency-score |
| Boardroom sessions | data/governance/boardroom_sessions.jsonl | consensus-alignment fitness axis |
| Dojo events | data/governance/dojo_events.jsonl | reputation-momentum |
| Gödel choices | data/logs/godel_choices.jsonl | godel-selection-rate axis + ledger join |
| Beliefs | data/memory/beliefs.json | learning-velocity, identity map |
| Model performance metrics | data/model_performance_metrics.json | latency-score |
| Improvement backlog | data/improvement_backlog.json | directive-coverage |
| Agent registry | daio/agents/agent_map.json | agent list for fitness leaderboard |
The aggregator handles missing files gracefully — every reader returns a sensible empty default, never raises into the request path.
improvement_summary() → 4 + 1 bucketsReturned by GET /insight/improvement/summary. Three time-window buckets plus directive coverage.
{
"campaigns_1h": {total, succeeded, running, errored, failed, blocked},
"campaigns_24h": {total, succeeded, running, errored, failed, blocked},
"campaigns_7d": {total, succeeded, running, errored, failed, blocked},
"belief_churn_per_hour": float,
"model_quality_trend": {model_id: {success_rate, avg_latency_ms, avg_quality, last_used}},
"directive_coverage": {
"backlog_total": int, # how many improvement suggestions on disk
"distinct_directives_attempted": int, # unique directive strings ever attempted
"total_campaigns": int, # campaigns ever recorded
"matched_in_backlog": int, # legacy substring match (kept for back-compat)
"attempted": int, # alias for distinct_directives_attempted
"coverage_ratio": float # distinct / backlog_total
}
}
Bucket size approximation. Production campaign records do not carry timestamps. The aggregator approximates time windows by list-tail slices:
| Window | Slice |
|---|---|
campaigns_1h | last 5 records |
campaigns_24h | last 25 records |
campaigns_7d | last 100 records |
This is honest given the data shape but not exact — if the system ran a thousand campaigns in 1 hour the 24h bucket only sees the last 25. A future version should add created_at timestamps to the campaign records and switch to time-range queries.
Every campaign passes through bucket() in _compute_improvement_summary. The mastermind sets overall_campaign_status="FAILURE_OR_INCOMPLETE" for any non-success outcome, collapsing 4 materially different states into one. The actual outcome lives in final_bdi_message — the aggregator reads that first.
final_bdi_message | bucket
─────────────────────────────────────────────────|──────────
contains "COMPLETED_GOAL_ACHIEVED" | succeeded
status == "SUCCESS" | succeeded
contains "CYCLE EXCEPTION" | errored
contains "FAILED" (FAILED_PLANNING, FAILED, …) | failed
contains "RUNNING" (BDI run RUNNING. Reason: …) | running
status == "RUNNING" or "IN_PROGRESS" | running
status contains "BLOCK" | blocked
otherwise | failed (truly unknown)
Why these 5 states are not 1.
max_cycles without crashing, typically because the skeleton fallback ran NO_OP repeatedly. Distinct from a real failure — the loop didn't error, it just ran out of time.Cycle Exception — usually a bug in a tool or in the BDI itself.FAILED_PLANNING, FAILED_EXECUTION, or FAILED_RECOVERY — the BDI gave up cleanly.The math invariant: succeeded + running + errored + failed + blocked == total. The feedback page renders a !math:N warning if this ever fails.
snapshot() → fitness leaderboardReturned by GET /insight/fitness. Per-agent rollup of 7 fitness axes, weighted into a single 0–100 score. Code: _compute_fitness_for_agent.
| Axis | Weight | Source | Floor |
|---|---|---|---|
campaign_success | 0.25 | mastermind_campaigns_history | 50 (neutral) |
trace_reliability | 0.20 | process_trace success rate | 50 |
latency_score | 0.10 | EMA of process latency | 50 |
consensus_alignment | 0.15 | boardroom vote-with-majority | 50 |
reputation_momentum | 0.10 | 7-day dojo events delta | 50 |
learning_velocity | 0.10 | new/updated beliefs in 24h | 50 |
godel_selection_rate | 0.10 | chosen / options_considered | 50 |
fitness = sum(axis_value × weight). Scores cluster around 50 when there's no data. This is intentional — neutral when uncertain, not zero.
trajectory(agent_id, days) → fitness over timeReads daily snapshots back N days. Used by /insight/fitness/{agent_id}/trajectory and the dashboard heatmap.
If the page shows wrong numbers, check in this order:
final_bdi_message field renamed to bdi_message would silently classify everything as failed.campaigns_24h: 25 total but the timeline shows 200, that's the slice approximation — not a bug, but worth surfacing./feedback.html improvement ledger. If !math:N appears, one of the classifier branches isn't firing.computed_at is in every response. If it's >120s old, the aggregator loop crashed — check data/logs/mindx_runtime.log for [insight] loop iteration failed.data/fitness/daily_snapshots.jsonl. If that's older than 24h the loop wasn't able to write it (disk full, perms, etc.).To add a new aggregated metric:
ImprovementSummary (or a sibling dataclass)._compute_improvement_summary (or a new _compute_ method).agg.improvement_summary() to /insight/improvement/summary.feedback.html and (optionally) text_render.py for ?h=true.The aggregator is the single chokepoint — nothing on the surface should compute its own statistics. That guarantee is the page's truth contract.
/feedback.html — the operator surface/feedback.txt — terminal-friendly snapshot?h=true on every /insight/ — plain-text rendering/insight/cognition shows where mindX is genuinely producing vs aspirationalimprovement_summary() data