Back to portfolio
June 14, 20267 min read

96.6% recall with no API keys: MemPalace stores AI context locally

Every LLM session you've ever had is probably gone. MemPalace is a local-first AI memory system that stores content verbatim, indexes it in a structured hierarchy, and retrieves it with 96.6% recall at R@5 on LongMemEval — running entirely on your local machine with no API key required.

AImemorytoolslocal-firstMCP

Every LLM session you've ever had is probably gone. A tab closes, a context window hits its limit, and all those decisions get lost: why you picked that library, what that error actually meant, what the plan was for that module.

MemPalace is a local-first AI memory system that stores content verbatim, indexes it in a structured hierarchy, and retrieves it with 96.6% recall at R@5 on LongMemEval. It runs entirely on your local machine with no API key required. It's open source and MIT-licensed.

MemPalace stores raw text, not summaries

Most AI memory systems summarize. MemPalace doesn't. It keeps your content as raw text (files, transcripts, messages, notes) and indexes those chunks for retrieval. Summaries throw away the details you'll want when debugging at 11pm, or when a decision stops making sense six months later. Verbatim storage means the source of truth is still there.

It's not a cloud SaaS, a monolithic agent platform, or a summarization engine. It's a memory substrate: the full history of your AI work, available on demand, on your own machine.

Wings, rooms, drawers: three index levels that let you scope queries

MemPalace organizes content in three levels:

  • Wings are coarse areas like "people" or "projects". Each specialist agent or workspace gets its own wing.
  • Rooms are topical scopes within a wing: a specific project, customer, or long-running thread.
  • Drawers are verbatim content chunks: files, transcripts, messages, notes.

When you mine a directory or transcript, MemPalace places the resulting drawers into this structure. Searches can be scoped to a wing or room when you know the context, which cuts noise. You're not running every query over one giant flat index.

Nothing leaves your machine by default

ChromaDB is the default vector store, local and embedded. The embedding model (around 300 MB) downloads and caches locally on first use. Nothing leaves your machine unless you explicitly configure an external backend.

If you want external storage, MemPalace supports Qdrant over REST and pgvector on Postgres, both configured via connection strings and environment variables. MemPalace writes marker files to your palace directory so you don't accidentally point a palace at the wrong database after a config change.

The default setup needs no cloud infrastructure: your machine, an embedding model, and ChromaDB.

96.6% recall with no API keys

MemPalace ships benchmarks and the code to reproduce them. On LongMemEval (a 500-question long-term memory benchmark):

  • Raw semantic search, no heuristics, no LLM: 96.6% R@5
  • Hybrid search with keyword boosting, temporal signals, and preference patterns: 98.4% R@5 on a held-out 450-question set
  • Hybrid search plus LLM reranking over the top 20 candidates: at or above 99% R@5

That 96.6% uses no API keys and no LLMs at any stage. Just local embeddings, ChromaDB, and the retrieval logic.

MemPalace also publishes results on LoCoMo, ConvoMem, and MemBench. Full per-question outputs are in the repo so you can audit or rerun the benchmarks yourself.

Four storage backends, one shared interface

The retrieval layer is pluggable. Current backends:

  • ChromaDB: default, local, embedded
  • sqlite_exact: correctness testing with exact vector math on SQLite
  • Qdrant over REST
  • pgvector on Postgres (JSONB plus vector columns)

Each backend implements the same contract, so the retrieval layer doesn't get shaped around one vendor. External backends support namespace isolation for multi-tenant use.

A knowledge graph for facts that change over time

Beyond vector search, MemPalace includes a temporal knowledge graph built on SQLite. You can add entities and relationships with validity windows, query and traverse the graph, invalidate or update facts as they change, and build timelines around specific entities.

Use it to model "who knew what, when" or track facts that shift. "This service ran on ECS, then migrated to Kubernetes in March 2025" is the kind of thing that doesn't survive summarization. In a structured graph, it's queryable.

29 MCP tools for Claude Code, Gemini CLI, and other agents

MemPalace is an MCP server exposing 29 tools: palace reads and writes, knowledge graph operations, cross-wing navigation, drawer management, and agent diary utilities.

Claude Code, Gemini CLI, and other MCP-compatible tools can call MemPalace as a context provider during sessions: "find the most relevant sessions about this repo," "recall what we decided about GraphQL," "show all drawers mentioning this bug ID."

Specialist agents each get their own wing and their own diary, building persistent expertise over time. They're discoverable at runtime via mempalace_list_agents rather than bloating the system prompt upfront.

Auto-save hooks preserve sessions before the context window compacts

Hooks are available for Claude Code, Codex, and Cursor IDE. They save conversations periodically and capture a snapshot before the host tool truncates context.

If you have old JSONL transcripts, backfill them:

mempalace mine ~/.claude/projects/ --mode convos

For per-message recall, mempalace sweep <transcript-dir> creates one drawer per message (user and assistant) in an idempotent way. That gives you message-level retrieval on top of file-level chunks.

Getting started

Install into an isolated environment:

uv tool install mempalace
mempalace init ~/projects/myapp

Mine your content and search:

# Mine content
mempalace mine ~/projects/myapp
mempalace mine ~/.claude/projects/ --mode convos

# Search your palace
mempalace search "why did we switch to GraphQL"

# Load relevant context before a new session
mempalace wake-up

If you'd rather not install Python, there's a Docker image that runs both the CLI and the MCP server with everything persisted under /data.

Start here

Run mempalace mine against a project directory, then mempalace wake-up before your next AI session. If you've been using Claude Code or Cursor with months of history, that's the fastest way to see what MemPalace recovers that you'd written off as gone.

The repo and reproducible benchmark scripts are on GitHub.