Skip to content

How Memory Works

Memory updates happen during context compaction, not on every turn. That keeps the conversation hot path cheap and lets the curator see a meaningful slice of recent activity.

PreCompact hook fires
Curator LLM → extracts memory drafts from the about-to-be-compacted turns
Resolver LLM → merges drafts against existing memories (insert / update /
promote / demote / forget)
Salience scorer → assigns a weight per memory based on novelty + reuse signals
SQLite + vec → memories land in ~/.ptah/ptah.db, chunks are embedded and
indexed for hybrid search

Both stages are LLM calls. By default they use claude-haiku-4-20251022 — fast and cheap, which matters because the curator runs every compaction. Override via memory.curatorModel if you want a sharper or cheaper model.

The curator’s output is structured: each draft has a kind (fact | preference | event | entity), a body, an optional subject, and a tier hint. The resolver does the work of deciding what’s actually new versus what’s a refinement of something Ptah already knows.

Each memory carries a salience score. The score increases when a memory is retrieved and used in subsequent turns, and decays exponentially when it’s not. The half-life is memory.decayHalflifeDays (default 14 days).

  • High salience + frequent hits → promoted toward core
  • Low salience over time → demoted toward archival, eventually pruned

Pinned memories (see Pinning & forgetting) are exempt from decay.

Embeddings run in a worker thread using transformers.js — no network calls, no API key. Model defaults to Xenova/bge-small-en-v1.5 (384 dims). First run downloads the model weights to your Electron user-data cache; subsequent runs are local-only.

All memory state is in ~/.ptah/ptah.db:

  • memories — one row per memory (kind, body, tier, salience, pinned, timestamps)
  • memory_chunks — text shards used for retrieval
  • memory_chunks_fts — FTS5 BM25 index
  • memory_chunks_vec — sqlite-vec embedding index

Alongside curated memory, Ptah keeps a separate code-symbol index for the current workspace. This is distinct from the curator pipeline above:

  • Memory chunks are LLM-extracted, scored, and tiered — they capture decisions and knowledge from your sessions.
  • Code symbols come straight from indexing your source tree — they capture structure (functions, classes, methods) so the agent can navigate and recall where code lives.

Indexing runs on your machine; nothing is uploaded. When the workspace changes, you can re-index from the Memory tab. Each indexed symbol records its name, kind (e.g. function, class, method), the file it lives in, and a token count.