The memory layer for agent teams. Self‑hosted. Deterministic retrieval. No LLM in the critical path.
One tier your whole pipeline shares — planner writes, executor reads, reviewer sees both. Namespaces, RBAC, provenance, temporal correctness, ranked retrieval, token budgets — built in, not bolted on. Python library, REST API, or containerized service. No SaaS middleman. No per‑seat fees. No vendor lock‑in. It’s yours.
Eight hundred memories in the store. Six light up. Not the six it searched — the six this moment needs. Your planner’s decision from Monday. The researcher’s finding from Wednesday. The reviewer’s objection from yesterday. Ranked, deduped, fit to budget. Zero LLM calls in the critical path.
Four writes across Mon–Thu land as persisted memories. Friday morning a fresh Portfolio Planner wakes up to a new task, calls mem.load_context(), and Memwright ranks + dedupes + budget-fits all four back into the context window. The agent resumes with full continuity — earnings signal, risk cap, prior stance, compliance precedent.
Single agents rediscover the same facts every run. Multi-agent pipelines are worse — the planner’s decisions never reach the executor, the researcher’s findings never reach the reviewer. So teams do the only thing they can: stuff giant prompts between agents. Burn tokens. Hope nothing important fell off the edge. That’s not an architecture. That’s a workaround.
We had a planner, a coder, a reviewer, a deployer — four agents in a pipeline. None of them knew what the others learned. We were passing giant prompts between them and burning tokens on stale information. Overheard · Engineering Lead, Fortune 100 Bank
More agents, more sessions, more memories — retrieval gets better while context cost stays flat.
Every recall and write is scoped to an AgentContext — identity, role, namespace, parent trail, token budget, write quota, visibility. Contexts are immutable. Spawning a sub-agent returns a new context with inherited provenance. Planner writes. Executor reads. Reviewer sees both. Every call authorised before it touches storage.
Every agent, project, or tenant gets its own namespace. Planner writes, coder reads, reviewer sees both. Isolated by default, shared when you configure it.
Orchestrator, Planner, Executor, Researcher, Reviewer, Monitor. Read-only observers to full admins. Control who can read, write, or supersede in each namespace.
Know which agent wrote which memory, when, and under which parent session. The reviewer can trace a decision back to the planner three sessions ago — not some unknown source.
Agent A learns "user works at Google." Agent B learns "user works at Meta." Memwright auto-supersedes. Full history preserved. Zero inference calls in the critical path.
recall(query, budget=2000) — a lightweight summarizer uses 500 tokens; a deep reasoner uses 5,000. Each agent receives exactly what fits in its context window.
Prevent a runaway agent from flooding the store with noise. Rate-limit writes per namespace, flag writes for human review, add compliance tags for audit.
When an agent calls recall(query, budget), five cooperating layers find, fuse, score, and fit the most relevant memories into the requested token ceiling. Ten million memories in the store. Your context window never sees more than the budget. And because there’s no LLM in the path, the same query always returns the same ranking. You can unit‑test it.
Stop-word filtered tag extraction against SQLite's tag index. Exact and partial hits. Fast. Deterministic.
Multi-hop BFS on the entity graph. Query "Python" discovers "FastAPI," "Django," and "pip" through relationship edges.
Semantic similarity for everything tag and graph miss. Cloud-native embeddings when available; local sentence-transformers otherwise.
Reciprocal Rank Fusion blends results from every layer; PageRank boosts memories about well-connected entities; confidence decay favors recent writes.
Maximal Marginal Relevance eliminates near-duplicates. Greedy selection packs the top-scoring memories into the caller's token budget. The rest never enter context.
Every memory is persisted across three complementary stores. Every supported backend combo is just a different technology choice for one or more of these roles.
Source of truth. Content, tags, entity, category, timestamps, provenance, confidence. Where add() commits and recall() hydrates final text.
Dense embedding per memory, keyed by memory ID. Finds memories by meaning when no tag or word overlaps the query.
Entity nodes + typed edges (uses, authored-by, supersedes). Query “Python” can surface “Django” via the graph.
add()
mem.add(content, tags, entity, ...)
│
┌────────────────────┼─────────────────────┐
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ Document │ │ Vector │ │ Graph │
│ store │ │ store │ │ store │
├───────────────┤ ├───────────────┤ ├───────────────┤
│ insert row │ │ embed(text) │ │ extract │
│ content, │ │ → 384-d vec │ │ entities + │
│ tags, meta, │ │ insert keyed │ │ relations │
│ provenance │ │ by memory ID │ │ upsert nodes, │
│ │ │ │ │ add edges │
└───────────────┘ └───────────────┘ └───────────────┘
│ │ │
└─────────────────────┼─────────────────────┘
▼
Contradiction check per entity
(older conflicting facts → superseded,
kept in timeline)
│
▼
done
The three writes commit as one logical transaction. On SQL backends it’s a real DB transaction; on distributed backends it’s sequenced with best-effort rollback.
recall()
mem.recall(query, budget=2000)
│
┌─────────────────────┼─────────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ Tag matcher │ │ Graph expander│ │ Vector search │
│ → doc store │ │ → graph store│ │ → vec store │
│ FTS, tags, │ │ BFS depth 2 │ │ cosine ANN │
│ stop-filter │ │ from entities│ │ top-K IDs │
└───────┬───────┘ └───────┬───────┘ └───────┬───────┘
│ │ │
└─────────────────────┼─────────────────────┘
▼
Candidate memory IDs (~100s)
│
▼
Hydrate IDs ← document store
│
▼
Fusion + Rank (RRF k=60 · PageRank · confidence decay)
│
▼
Diversity + Fit (MMR λ=0.7 · greedy token-budget pack)
│
▼
Ranked memories ≤ budget tokens
(zero LLM calls)
Only memory IDs travel between layers until the hydrate step. A store with ten million rows still returns a tight result set inside the caller’s token ceiling.
Isolation. Temporal correctness. Provenance. Not features we bolted on. Primitives we started from.
Every memory lives inside a namespace. Every agent is bound to one of six roles. Every call is authorised before it touches storage. Enforced at the row level, not in application code.
Memwright doesn’t overwrite. It supersedes. Every fact has a validity window. The timeline is replayable to any point in the past.
Ask recall(as_of=…) to replay the past. Nothing is deleted. Auditors can reconstruct what the desk knew and when.
Every sentence the agent writes back is traceable to its source. A cryptographic chain from raw feed ingest to grounded answer.
The citation isn’t a string the LLM chose. It’s the memory_id carried through the pipeline. Click it, land on the raw wire, verify the hash. Grounded, not plausible.
One Python library. One Starlette ASGI container. Six deployment targets — laptop, dev VM, AWS, Azure, GCP, on‑prem. The three storage roles swap per column. Your agent code never learns which backend it’s talking to. Terraform templates live under agent_memory/infra/. Clone. Set your variables. terraform apply.
$ pip install memwright $ memwright api --host 0.0.0.0 --port 8080
Starlette ASGI on http://localhost:8080. SQLite + ChromaDB + NetworkX provision automatically under ~/.memwright. Point every agent in your stack at the same URL — they share memory instantly. No Docker. No API keys. Air-gap it behind your firewall and walk away.
App Runner with Starlette ASGI. Auto-scaling, HTTPS, custom domains.
Container Apps with Cosmos DB DiskANN. Scale-to-zero. Same API, same results.
Cloud Run with AlloyDB. Scale-to-zero. Google's managed infrastructure.
pgvector + Apache AGE. Neon serverless or any Postgres 16.
Multi-model: graph + document + vector in one engine. Oasis or self-hosted.
SQLite + ChromaDB + NetworkX. Air-gapped deployments. Full data sovereignty.
One Memwright image. Six targets. The three storage roles swap per column. That’s the whole trick.
pip installEvery deployment is the same Python library wrapped in the same Starlette ASGI container. DocumentStore, VectorStore, and GraphStore are three interfaces; each column above is one implementation triple.
Prototype on a laptop. Promote to Docker Compose on a dev VM. Promote to a managed container runtime. Same API throughout. Only the storage URLs change.
Same engine. Same storage. Same retrieval. Different coupling. Pick by latency budget and how many agents share the tier.
AgentMemory(’./store’) in‑process. Sub‑millisecond. Right for a single agent prototyping on a laptop.
memwright api on localhost:8080. Any language. Go or Rust agents call HTTP without Python in the image.
One Memwright service in front of an agent mesh. App Runner · Cloud Run · Container Apps. Managed storage behind. This is the production shape.
Code path is identical across all three. Only configuration changes.
Not a config file. Not an MCP setup guide. Three words in your terminal. Memwright interviews you, installs itself, wires the hooks, and runs a health check before you’ve put the kettle on.
Paste into Claude Code · global or project scope · auto‑merges MCP config
Same call surface whether Memwright runs in‑process on your laptop or behind a Cloud Run service. Pick the interface. Ship.
What we won’t compromise on — no matter how loud the pressure gets.
Your data stays in your infrastructure. Self‑hosted, always.
Retrieval is deterministic. No LLM judges. No hidden inference in the path.
One recall(). Every backend. Swap without rewriting.
Agent teams are first‑class. Not a bolt‑on.
Boring where it counts. Proven, debuggable, no magic.