18 decimal places. Actual values only. No estimation. Blockchain-grade precision.>
Live benchmark: Latest Benchmark | Results:data/cloud_test_results.json| Script:test_cloud_all_models.py
eval_count and prompt_eval_count from Ollama APIDecimal (28-digit significand), no float driftSame precision as blockchain token tracking:
This isn't cosmetic. Float accumulation loses precision:
# Float fails
acc = 0.0
for _ in range(1_000_000):
acc += 1e-18
print(acc) # 9.999999999999843e-13 (WRONG)
Decimal succeeds
from decimal import Decimal
acc = Decimal("0")
for _ in range(1_000_000):
acc += Decimal("1e-18")
print(acc) # 1.000000E-12 (EXACT)
Over millions of requests on a long-running system, float drift becomes measurable.
Every response from /api/chat and /api/generate includes:
eval_countprompt_eval_counttotal_durationload_durationprompt_eval_durationeval_durationAll integers. All exact. All directly from the model runtime.
# api/ollama/ollama_url.py — OLD
estimated_tokens = len(prompt.split()) 1.3 + len(content.split()) 1.3
self.metrics.total_tokens += int(estimated_tokens)
agents/core/ollama_chat_manager.py — OLD
response_length = len(response.split()) 1.3
tokens_per_second = response_length / latency # Rough estimate
# api/ollama/ollama_url.py — NEW
total_tokens = eval_count + prompt_eval_count # exact integers from API
self.metrics.total_tokens += total_tokens
Precision tracker records with Decimal at 18dp
precision_response = OllamaResponseMetrics.from_api_response(data, model=model)
self.precision_tracker.record(precision_response)
agents/core/ollama_chat_manager.py — NEW
rd = self.ollama_api._last_response_data
tokens_per_second = rd.get("tokens_per_second", 0) # from eval_count / eval_duration_ns
llm/precision_metrics.pyOllamaResponseMetrics — Metrics from a single API response
metrics = OllamaResponseMetrics.from_api_response(data, model="qwen3:1.7b")
metrics.total_tokens # int: exact
metrics.tokens_per_second # Decimal: from nanosecond timing
metrics.to_subtokens() # {"total_tokens_subtokens": "53000000000000000000"}
PrecisionAccumulator — Decimal-precision running statistics
acc = PrecisionAccumulator()
acc.record(Decimal("800.309315739536589275"))
acc.mean # Decimal, not float
acc.min # Decimal
acc.max # Decimal
ModelPrecisionMetrics — Per-model precision tracking
model_metrics.total_eval_count # int: exact total output tokens
model_metrics.total_prompt_eval_count # int: exact total input tokens
model_metrics.aggregate_tokens_per_second # Decimal: total_tokens / total_duration
model_metrics.actual_count_rate # Decimal: fraction with real counts
PrecisionMetricsTracker — Central tracker (singleton in OllamaAPI)
tracker = PrecisionMetricsTracker()
tracker.record(response_metrics)
tracker.global_total_tokens # int: exact
tracker.global_tokens_per_second # Decimal
tracker.summary() # Full 18dp JSON report
Two methods, both from actuals:
tokens_per_second = eval_count / (eval_duration_ns 1e-9)
Calculated per response, accumulated via PrecisionAccumulator.mean.
aggregate_tps = total_eval_count / (total_eval_duration_ns 1e-9)
Total tokens divided by total generation time. More statistically robust — not affected by outlier requests.
OllamaAPI.metrics.total_tokensOllamaAPI.precision_trackerOllamaAPI._last_response_dataOllamaChatManager.tokens_per_secondOllamaChatManager.chat() returnHierarchicalModelScorerInferenceOptimizerMetrics persist to data/metrics/precision_metrics.json every 50 requests:
On load, accumulators (mean, min, max) are not restored — only totals. This is correct because per-request Decimal accumulators are session-level statistics, while totals are the durable scientific record.
from llm.precision_metrics import PrecisionMetricsTracker, OllamaResponseMetrics
tracker = PrecisionMetricsTracker()
Simulate response
data = {"eval_count": 42, "prompt_eval_count": 11, "eval_duration": 52479709}
metrics = OllamaResponseMetrics.from_api_response(data, model="qwen3:1.7b")
assert metrics.total_tokens == 53
assert metrics.has_actual_counts == True
assert metrics.tokens_per_second > 0
tracker.record(metrics)
assert tracker.global_total_tokens == 53
print(tracker.summary())