Full-context replay re-reads the whole conversation every turn — input tokens that grow O(n²) and a bill that compounds with every session. Attestor retrieves only what's needed: flat ~200 tokens per call, 21× fewer input tokens by turn 100, 100% recall — measured across six models, open and closed.
# when new information arrives await attestor.add(namespace, content)
# before each model call — returns ~200 flat tokens
facts = await attestor.recall(namespace, query)
# working context is now flat. session caps come off.
21×
fewer input tokens @ 100 turns
~200
tokens / call, flat — turn 1 to 100
100%
recall held the whole session
O(n)
linear, not O(n²) — by architecture
The token math · linear vs quadratic
Same answer. 21× fewer input tokens to get it.
Re-sending the transcript makes input tokens a parabola; retrieving the one fact you need makes them a straight line. Both hold 100% recall — the only thing that changes is how much the model has to read.
t100
turn
1.33M
replay input tokens
61.8K
attestor input tokens
21.5×
fewer
replay — re-send all (→ 1.33M)attestor — retrieve (→ 61.8K)gpt-5.4 · 100% recall both sides
$24.15
Claude Opus 4 · one 100-turn session full-context replay
Input × is model-independent — it's workload geometry. Every model: 5.6× (t24) → 11× (t50) → ~21–22.5× (t100). Verify it yourself with context-clock.
Who it's for
One architecture. Three rooms.
The token curve is the same problem whether you're underwriting it, operating it, or building it. Pick your seat.
The diligence question: is cost per session linear or quadratic?
Full-context replay is O(n²). Your most engaged users become your most expensive liabilities — and the curve compounds with every session they run.
Token governance is now a CFO-level line item. Salesforce reported 20 trillion tokens consumed in a quarter; teams burn annual AI budgets in months. A product whose cost grows quadratically with engagement has a margin problem hiding in plain sight.
One Claude Opus 4 session, 100 turns: $24.15 → $1.24 with Attestor.
The fix is architectural, not a model swap — it survives the next model.
Both tools are open-source: the benchmark is the diligence, and it's free.
Deterministic recall and a bitemporal audit trail — without the exponential token tax.
Cap context to control cost and you trigger silent memory loss: recall decays 100% → 0% by turn 10, identical across every model. Truncation is mechanical — bigger models don't save you.
Attestor replaces transcript replay with targeted retrieval over a graph + vector store with bitemporal semantics: every fact is timestamped across valid-time and transaction-time, so agent state is immutable, auditable, and reproducible. Per-call context stays flat at ~200 tokens whether the agent is on turn 1 or turn 100.
Flat ~200 tok/call → cost fixed per turn, not compounding.
That's it. Per-call context drops from a climbing parabola to a flat ~200 tokens, your session-length caps come off, and recall holds at 100% across all 100 turns. Measure first — context-clock runs locally on Ollama, no API keys.
Tokenmaxing isn't a model swap. It's an architecture.
Measure your agent's token waste with context-clock, then fix it with Attestor — flat context, 21× fewer input tokens, 100% recall. Both free, both open source.