Skip to main content
Agent Memory runs silently alongside your AI coding agent, building a persistent, searchable record of everything your agent does. Every file it reads, every command it runs, every decision it makes — all of it flows through a lifecycle that starts with raw capture and ends with the right context injected into your next session. You never re-explain your stack, re-discover the same bug, or re-teach the same conventions. Agent Memory handles that for you.

The Memory Lifecycle

At a high level, Agent Memory works in three phases: capture, consolidate, and recall. During a session, hooks fire automatically on every tool use and observation is collected. When the session ends, a consolidation pipeline compresses those raw observations into durable, searchable memories. When your next session starts, the most relevant memories are retrieved and prepended to your agent’s context window — before your first prompt.
1

Agent calls a tool

Your agent reads a file, runs a shell command, performs a web fetch, or executes any other tool. This is the raw material Agent Memory works with.
2

Agent Memory intercepts via hooks

Before and after every tool call, Agent Memory’s hook handlers fire. These hooks capture the tool name, its inputs, its output, and the surrounding context — including any errors that occurred.
3

Observations are stored and optionally compressed

The raw observation is stored immediately. If you have an LLM provider configured and AGENTMEMORY_AUTO_COMPRESS=true, Agent Memory also calls your LLM to compress the observation into structured facts, a narrative summary, and extracted concepts. Without an LLM, a synthetic (BM25-compatible) compression path runs instead — so search still works, just without AI-generated summaries.
4

Session-end consolidation pipeline runs

When your session ends, Agent Memory runs the consolidation pipeline: raw working-memory observations are summarized into episodic memories, episodic memories are distilled into semantic facts and procedural patterns. A knowledge graph is optionally extracted if GRAPH_EXTRACTION_ENABLED=true.
5

Next session starts with injected context

When your next session begins, Agent Memory performs a hybrid search against all stored memories, retrieves the most relevant ones within your token budget (default: 2,000 tokens), and injects them as context before your first prompt lands. Your agent already knows what it learned last time.

Hook Types

Agent Memory intercepts your agent’s activity through a set of lifecycle hooks. Each hook type maps to a specific event in your agent’s session:

session_start

Fires when a new session begins. Triggers context retrieval and injects relevant memories from past sessions into the conversation.

session_end

Fires when a session completes. Triggers the full consolidation pipeline — session summary, graph extraction, slot reflection.

pre_tool_use

Fires before every tool call. Captures file access patterns and enriches context so Agent Memory knows what your agent is about to touch.

post_tool_use

Fires after every successful tool call. This is the primary capture point — tool name, input, and output are all recorded here.

post_tool_failure

Fires when a tool call fails. Captures error context, stack traces, and failure patterns so your agent learns from mistakes across sessions.

prompt_submit

Fires when you submit a prompt. Captures the user prompt (privacy-filtered) to provide conversation context for memory retrieval.

task_completed

Fires when a task completes. Triggers an end-of-session summary and the knowledge graph extraction pass.

stop

Fires when the agent stops. Works in tandem with task_completed to close the session and flush final observations.

pre_compact

Fires before the agent compacts its context window. Agent Memory captures a snapshot of the current working context so no observations are lost during compaction.

notification

Fires when the agent receives a notification. Captures notification content as an observation so background events are part of the session record.

subagent_start / subagent_stop

Fires when sub-agents are spawned or complete. Tracks sub-agent lifecycle so multi-agent workflows are captured holistically.
The full set of hook types, from src/types.ts:
type HookType =
  | "session_start"
  | "prompt_submit"
  | "pre_tool_use"
  | "post_tool_use"
  | "post_tool_failure"
  | "pre_compact"
  | "subagent_start"
  | "subagent_stop"
  | "notification"
  | "task_completed"
  | "stop"
  | "session_end";
Different agent integrations support different subsets of these hooks. Claude Code supports all 12. Codex CLI supports 6 (SessionStart, UserPromptSubmit, PreToolUse, PostToolUse, PreCompact, Stop). Any agent connected via MCP gets memory tools but may not emit all hook events automatically.

What Gets Captured

Agent Memory categorizes every observation it records into a typed ObservationType. This lets you search, filter, and recall memories by the kind of activity they represent:
Every time your agent reads, writes, or edits a file, Agent Memory records the file path, the content (where relevant), and the context around why the file was touched. Over time, this builds a per-file history you can retrieve with memory_file_history.
Shell commands your agent executes — build steps, test runs, installs, migrations — are captured with their output and exit codes. Patterns in which commands succeed or fail become part of your agent’s procedural memory.
When your agent searches or fetches external content, the query and results are captured. This helps Agent Memory understand what your agent was researching and surface relevant findings in future sessions.
Explicit decisions (choosing one library over another, picking an architecture pattern) and discoveries (finding a root cause, understanding a system’s behavior) are captured as high-importance observations that feed directly into semantic memory.
Errors your agent encounters — and the prompts and responses exchanged — are captured to build a record of what went wrong and how it was resolved.
Task lifecycle events, sub-agent operations, notifications, and image-based observations (when your agent processes screenshots or diagrams) are all tracked.
The complete ObservationType union from src/types.ts:
type ObservationType =
  | "file_read"
  | "file_write"
  | "file_edit"
  | "command_run"
  | "search"
  | "web_fetch"
  | "conversation"
  | "error"
  | "decision"
  | "discovery"
  | "subagent"
  | "notification"
  | "task"
  | "image"
  | "other";

Memory Storage

All data is stored locally on your machine under ~/.agentmemory/ using SQLite. There is no cloud sync, no external database, no third-party service. Your memories stay on your hardware.
~/.agentmemory/
├── .env          # Your configuration (LLM keys, feature flags)
├── agentmemory.db  # SQLite database: sessions, observations, memories, graph
└── snapshots/    # Git-versioned memory snapshots (if SNAPSHOT_ENABLED=true)
Agent Memory uses SQLite with an in-memory vector index for embeddings — the same process serves BM25 keyword search, vector search, and the knowledge graph. No Postgres, no Qdrant, no Redis required.

Token Injection

At session start, Agent Memory retrieves the most relevant memories from your history and prepends them to your agent’s context window. This is what makes your agent “remember” — it literally receives a curated summary of relevant past work before your first prompt. You control the injection behavior with two settings in ~/.agentmemory/.env:
# Enable context injection at session start (off by default)
AGENTMEMORY_INJECT_CONTEXT=true

# Maximum tokens of injected context per session (default: 2000)
TOKEN_BUDGET=2000
AGENTMEMORY_INJECT_CONTEXT is off by default. When off, Agent Memory still captures and stores all observations — it just doesn’t inject them back at session start via the hook. You can still retrieve memories manually at any time using the memory_smart_search MCP tool or the REST API. Turn injection on once you’re comfortable with how much context it adds to your sessions.
The injected context is retrieved using hybrid search (BM25 + vector + knowledge graph) and ranked by relevance to your current project. Results are capped to TOKEN_BUDGET tokens and diversified across sessions so you don’t get the same session’s memories repeated.
Agent Memory works even without an LLM API key. Without one, it uses BM25 keyword search and optional local embeddings (install @xenova/transformers for free offline vector search). Consolidation into semantic and procedural memory requires an LLM, but capture, storage, and search all work out of the box with zero API keys.