Three Search Streams
BM25 — Keyword Search
BM25 is always available — no LLM key, no embedding provider, no setup required. It tokenizes your query and your stored memories using stemming and synonym expansion, then scores matches using TF-IDF-style term frequency weighting.BM25 excels at exact and near-exact matches: if you search for “jose JWT middleware”, BM25 finds observations that contain those terms. It also handles multilingual content — Greek, Cyrillic, Hebrew, Arabic, and accented Latin are tokenized out of the box. For Chinese, Japanese, and Korean content, install the optional segmenters:Default weight:
BM25_WEIGHT=0.4BM25 runs in-process against your SQLite database with no network calls. It’s the fastest stream and the fallback when no embedding provider is configured.Vector Search — Semantic Similarity
Vector search converts your query and stored observations into dense embedding vectors, then finds the observations whose vectors are closest to your query using cosine similarity. This enables semantic retrieval: searching for “database performance optimization” can return the observation where your agent fixed an N+1 query, even if that observation never used the words “performance” or “optimization.”Vector search requires an embedding provider. Agent Memory auto-detects your provider from environment variables (see the Embedding Providers table below). For free offline embeddings, install:This gives you the
all-MiniLM-L6-v2 model running entirely on your machine — no API calls, no cost.Default weight: VECTOR_WEIGHT=0.6Vector search adds approximately 8 percentage points of recall over BM25 alone on the LongMemEval-S benchmark.Knowledge Graph — Conceptual Traversal
The knowledge graph stream traverses a graph of entities and relationships extracted from your memories. When your query mentions a concept (a file name, a library, an error type, an architectural pattern), Agent Memory identifies matching graph nodes and walks their edges to find related observations that a keyword or vector search might not surface.Knowledge graph search requires
GRAPH_EXTRACTION_ENABLED=true and an LLM provider. Once enabled, Agent Memory automatically extracts entities and relationships at session end.Default weight: AGENTMEMORY_GRAPH_WEIGHT=0.3The graph is optional but powerful for conceptual queries — “what decisions did we make about authentication?” surfaces the graph path through the authentication concept node to related decisions, files, and patterns.Reciprocal Rank Fusion (RRF)
The three streams each produce a ranked list of results. Reciprocal Rank Fusion merges those lists into one, giving credit to results that appear highly ranked across multiple streams:k=60, a standard value that smooths out differences between rank positions. A result ranked #1 by BM25 and #3 by vector search scores higher than a result ranked #1 by only one stream. This fusion approach means you don’t need to tune individual stream weights obsessively — RRF naturally surfaces results that multiple search methods agree on.
Results are also session-diversified: Agent Memory caps results from any single session at 3, so you get a spread of relevant context across your history rather than the entirety of one verbose session.
Embedding Providers
Agent Memory auto-detects your embedding provider from environment variables. SetEMBEDDING_PROVIDER to force a specific provider, or let Agent Memory pick based on which API keys are present.
| Provider | Env Variable | Model | Notes |
|---|---|---|---|
| Local (offline) | EMBEDDING_PROVIDER=local | all-MiniLM-L6-v2 | Free, no API calls. Requires npm install @xenova/transformers. Best starting point. |
| OpenAI | OPENAI_API_KEY | text-embedding-3-small | Highest quality embeddings. Also activates the OpenAI LLM provider. |
| Voyage AI | VOYAGE_API_KEY | voyage-code-3 | Optimized specifically for code — recommended if your sessions are code-heavy. |
| Cohere | COHERE_API_KEY | embed-english-v3.0 | Strong general-purpose embeddings with a free trial tier. |
| Gemini | GEMINI_API_KEY | gemini-embedding-001 | 100+ languages, supports 768/1536/3072 dimensions (MRL), 2048-token input. |
| OpenRouter | OPENROUTER_API_KEY | provider-dependent | Multi-model proxy; embedding support varies by the underlying model. |
If multiple API keys are set, Agent Memory uses this auto-detection priority: Gemini → OpenAI → Voyage → Cohere → OpenRouter. Set
EMBEDDING_PROVIDER=local explicitly to use local embeddings even when other keys are present.~/.agentmemory/.env:
Knowledge Graph
WhenGRAPH_EXTRACTION_ENABLED=true, Agent Memory uses your LLM to extract entities and relationships from observations at session end. These form a graph of typed nodes and edges:
Node types: file, function, concept, error, decision, pattern, library, person, project, preference, location, organization, event
Edge types: uses, imports, modifies, causes, fixes, depends_on, related_to, prefers, blocked_by, caused_by, optimizes_for, rejected, avoids, succeeded_by
Enable graph extraction in ~/.agentmemory/.env:
memory_graph_query MCP tool:
http://localhost:3113.
Search Tuning
You can tune every aspect of how search behaves through environment variables in~/.agentmemory/.env:
When to increase BM25_WEIGHT
When to increase BM25_WEIGHT
Increase
BM25_WEIGHT (e.g., to 0.6) when you’re working with highly technical, terminology-dense codebases where exact term matching matters — specific function names, error codes, flag names. Keyword matching is more precise when the vocabulary is consistent.When to increase VECTOR_WEIGHT
When to increase VECTOR_WEIGHT
Increase
VECTOR_WEIGHT (e.g., to 0.8) when you want more semantic retrieval — finding observations about a concept even when the wording varies. Useful when your sessions use varied language to describe the same problems.When to enable GRAPH_EXTRACTION_ENABLED
When to enable GRAPH_EXTRACTION_ENABLED
Enable the knowledge graph when you want conceptual traversal — “what else relates to this library?” or “what caused this error pattern?” The graph excels at multi-hop queries that keyword and vector search don’t cover well. Requires an LLM provider.
Token Budget
TheTOKEN_BUDGET setting controls the maximum number of tokens Agent Memory injects into your context window at session start. The default is 2,000 tokens.
- Results are scored and ranked by the hybrid search
- Memories are added to the context block from highest to lowest score
- Once the running token count would exceed
TOKEN_BUDGET, injection stops - Results are session-diversified before budget trimming (max 3 per session)
Using Search
- MCP Tool
- REST API
Use the This runs the full triple-stream hybrid search and returns ranked
memory_smart_search MCP tool from your agent:CompressedObservation results with individual BM25, vector, and graph scores plus the combined score.Agent Memory works without any API keys using BM25-only search. Install
@xenova/transformers to add free local vector embeddings. Add an LLM key and set GRAPH_EXTRACTION_ENABLED=true for the full triple-stream experience. Each layer is additive — you get value at every tier.