insight_aggregator.md · 8.5 KB

InsightAggregator — the diagnostic data pipeline

File: mindx_backend_service/insight_aggregator.py Surface: every /insight/ endpoint (and through them, /feedback.html, /feedback.txt, and the landing dashboard's Self-Improvement Ledger row).

The aggregator is the single async loop that turns mindX's append-only logs and per-agent files into the cached numerical surface the operator sees. Nothing else owns these numbers. If something on the page reads wrong, it's wrong here.

Cadence

Inputs (read each cycle)

SourcePathPurpose
Mastermind campaign historydata/memory/agent_workspaces/mastermind_prime/mastermind_campaigns_history.jsonsource for ledger / improvement summary buckets
Per-agent process tracesdata/memory/agent_workspaces/{agent_id}/process_trace.jsonltrace-reliability, latency-score
Boardroom sessionsdata/governance/boardroom_sessions.jsonlconsensus-alignment fitness axis
Dojo eventsdata/governance/dojo_events.jsonlreputation-momentum
Gödel choicesdata/logs/godel_choices.jsonlgodel-selection-rate axis + ledger join
Beliefsdata/memory/beliefs.jsonlearning-velocity, identity map
Model performance metricsdata/model_performance_metrics.jsonlatency-score
Improvement backlogdata/improvement_backlog.jsondirective-coverage
Agent registrydaio/agents/agent_map.jsonagent list for fitness leaderboard

The aggregator handles missing files gracefully — every reader returns a sensible empty default, never raises into the request path.

Outputs (cached, served to API)

improvement_summary() → 4 + 1 buckets

Returned by GET /insight/improvement/summary. Three time-window buckets plus directive coverage.

{
  "campaigns_1h":  {total, succeeded, running, errored, failed, blocked},
  "campaigns_24h": {total, succeeded, running, errored, failed, blocked},
  "campaigns_7d":  {total, succeeded, running, errored, failed, blocked},
  "belief_churn_per_hour": float,
  "model_quality_trend": {model_id: {success_rate, avg_latency_ms, avg_quality, last_used}},
  "directive_coverage": {
    "backlog_total":               int,  # how many improvement suggestions on disk
    "distinct_directives_attempted": int, # unique directive strings ever attempted
    "total_campaigns":             int,  # campaigns ever recorded
    "matched_in_backlog":          int,  # legacy substring match (kept for back-compat)
    "attempted":                   int,  # alias for distinct_directives_attempted
    "coverage_ratio":              float # distinct / backlog_total
  }
}

Bucket size approximation. Production campaign records do not carry timestamps. The aggregator approximates time windows by list-tail slices:

WindowSlice
campaigns_1hlast 5 records
campaigns_24hlast 25 records
campaigns_7dlast 100 records

This is honest given the data shape but not exact — if the system ran a thousand campaigns in 1 hour the 24h bucket only sees the last 25. A future version should add created_at timestamps to the campaign records and switch to time-range queries.

Classification (the 5-state machine)

Every campaign passes through bucket() in _compute_improvement_summary. The mastermind sets overall_campaign_status="FAILURE_OR_INCOMPLETE" for any non-success outcome, collapsing 4 materially different states into one. The actual outcome lives in final_bdi_message — the aggregator reads that first.

                final_bdi_message               | bucket
─────────────────────────────────────────────────|──────────
contains "COMPLETED_GOAL_ACHIEVED"               | succeeded
status == "SUCCESS"                              | succeeded
contains "CYCLE EXCEPTION"                       | errored
contains "FAILED" (FAILED_PLANNING, FAILED, …)   | failed
contains "RUNNING"  (BDI run RUNNING. Reason: …) | running
status == "RUNNING" or "IN_PROGRESS"             | running
status contains "BLOCK"                          | blocked
otherwise                                        | failed (truly unknown)

Why these 5 states are not 1.

The math invariant: succeeded + running + errored + failed + blocked == total. The feedback page renders a !math:N warning if this ever fails.

snapshot() → fitness leaderboard

Returned by GET /insight/fitness. Per-agent rollup of 7 fitness axes, weighted into a single 0–100 score. Code: _compute_fitness_for_agent.

AxisWeightSourceFloor
campaign_success0.25mastermind_campaigns_history50 (neutral)
trace_reliability0.20process_trace success rate50
latency_score0.10EMA of process latency50
consensus_alignment0.15boardroom vote-with-majority50
reputation_momentum0.107-day dojo events delta50
learning_velocity0.10new/updated beliefs in 24h50
godel_selection_rate0.10chosen / options_considered50

fitness = sum(axis_value × weight). Scores cluster around 50 when there's no data. This is intentional — neutral when uncertain, not zero.

trajectory(agent_id, days) → fitness over time

Reads daily snapshots back N days. Used by /insight/fitness/{agent_id}/trajectory and the dashboard heatmap.

How to read failures

If the page shows wrong numbers, check in this order:

  1. Source file format. Did the writer schema change? A final_bdi_message field renamed to bdi_message would silently classify everything as failed.
  2. Bucket-tail slicing. If the aggregator says campaigns_24h: 25 total but the timeline shows 200, that's the slice approximation — not a bug, but worth surfacing.
  3. Math invariant. Open the browser console while on /feedback.html improvement ledger. If !math:N appears, one of the classifier branches isn't firing.
  4. Cache freshness. computed_at is in every response. If it's >120s old, the aggregator loop crashed — check data/logs/mindx_runtime.log for [insight] loop iteration failed.
  5. Stale daily snapshot. Trajectory queries pull from data/fitness/daily_snapshots.jsonl. If that's older than 24h the loop wasn't able to write it (disk full, perms, etc.).

How to extend

To add a new aggregated metric:

  1. Add a field to ImprovementSummary (or a sibling dataclass).
  2. Compute it inside _compute_improvement_summary (or a new _compute_ method).
  3. Surface it via agg.improvement_summary() to /insight/improvement/summary.
  4. Render it in feedback.html and (optionally) text_render.py for ?h=true.
  5. Document it here.

The aggregator is the single chokepoint — nothing on the surface should compute its own statistics. That guarantee is the page's truth contract.

Related


Referenced in this document
NAV

All DocumentsDocument IndexThe Book of mindXImprovement JournalAPI Reference