Memory
ScalyClaw remembers. Not just within a conversation, but across all conversations, all channels, forever. Every interaction is an opportunity to learn something about the user — a preference, a fact, a relationship, a recurring topic. That knowledge accumulates over time in a persistent memory store and is automatically surfaced when it becomes relevant, making every future conversation smarter and more personal than the last.
Memory in ScalyClaw is not a list of chat logs. It is a structured, searchable store of extracted knowledge — discrete facts and observations that the LLM identifies and records during the course of normal conversation, then retrieves on demand without any user intervention.
How Memory Works
Memory operates entirely in the background. From the user's perspective, ScalyClaw simply seems to know things. Under the hood, two processes are running in every conversation: extraction and retrieval.
Extraction
During and after each conversation turn, the LLM evaluates the exchange for information worth preserving. This is not keyword matching or rule-based parsing — the model exercises judgment about what is meaningful, stable, and useful to remember. A user mentioning in passing that they work at Acme Corp is extracted as a fact. A user expressing frustration with verbose answers is extracted as a preference. Casual small talk that carries no durable information is discarded.
When the LLM decides something is worth saving, it calls the built-in memory_store tool with a structured payload: the type, the content phrased as a clear, retrievable statement, a confidence score between 0 and 1 reflecting how certain the extraction is, and optional tags and TTL.
Storage
Memory entries are stored in SQLite using bun:sqlite, the high-performance native SQLite driver bundled with Bun. Each entry is stored with its text content alongside a float32 vector embedding generated by the configured embedding model. Two indexes are maintained in parallel:
- sqlite-vec — a vector index for cosine similarity search. This is the primary retrieval path, used whenever an embedding model is configured and functional. It finds semantically similar memories even when the exact words do not match.
- FTS5 — SQLite's built-in full-text search index. This serves as the fallback when vector search is unavailable or returns insufficient results. It matches on exact words and common morphological variants.
Retrieval Flow
Every time a new message arrives, ScalyClaw runs a memory retrieval pass before constructing the system prompt for the LLM. The query used for retrieval is derived from the incoming message — or from a synthesis of the recent conversation context when more than one turn is available. The retrieval pipeline runs as follows:
- The query text is embedded using the configured embedding model to produce a float32 vector.
- sqlite-vec performs a cosine similarity search against all stored memory vectors, returning candidates ranked by distance.
- If sqlite-vec returns fewer than the required number of results — or if the embedding model is unavailable — FTS5 full-text search runs as a fallback against the same query.
- Results from both passes are merged, de-duplicated, and re-ranked by a combined score that weights cosine similarity and confidence together.
- The top-K results are formatted and injected into the system prompt as a
Memoriessection, giving the LLM explicit access to what it knows about the user before it generates a response.
Memory retrieval runs concurrently with other pre-processing steps and adds negligible latency. The vector search over sqlite-vec completes in single-digit milliseconds for stores with tens of thousands of entries. The FTS5 fallback is similarly fast. Neither path involves a network call — all data is local.
Retrieval Pipeline at a Glance
Incoming message ↓ embed(query) → float32 vector ↓ sqlite-vec cosine search → ranked candidates ↓ (if insufficient results) FTS5 full-text fallback → keyword candidates ↓ merge + deduplicate + re-rank score = α × cosine_sim + β × confidence ↓ top-K results → injected into system prompt
Memory Types
Every memory entry is tagged with a type that describes the nature of the information. The type informs how the LLM interprets and uses the memory, and it is also available as a filter in the dashboard memory browser.
| Type | Description | Example | Confidence pattern |
|---|---|---|---|
| fact | An objective, verifiable piece of information about the user or their world. Facts are treated as stable — they do not expire on their own but can be updated when the user provides new information. | "User works at Acme Corp." | Starts high (0.85–1.0) when stated directly; lower (0.5–0.7) when inferred. Confidence rises on subsequent confirmation. |
| preference | A stated or observed inclination — how the user likes to receive information, what tools or formats they favor, what topics they want avoided. Preferences directly shape how the LLM responds. | "Prefers concise answers over verbose explanations." | Inferred preferences start around 0.6 and strengthen each time the preference is reinforced or explicitly confirmed. |
| relationship | Information about people in the user's life — names, roles, and connections. Lets ScalyClaw refer to the user's colleagues, family, or friends by name without asking every time. | "User's partner is named Alex." | High confidence (0.9+) when stated directly. Moderate (0.6–0.8) when inferred from context such as pronoun use or indirect reference. |
| event | Time-bound occurrences — appointments, deadlines, milestones, or past experiences the user has mentioned. Events are stored with whatever temporal context was provided. | "User has a product review meeting on Friday." | Typically high confidence (0.85+) when the user states the event explicitly. Confidence is not automatically reduced after the event date passes — the event remains as a historical record. |
| analysis | Higher-level patterns and observations synthesized by the LLM from multiple exchanges — behavioral tendencies, recurring topics, or inferred characteristics. More speculative than facts but valuable for personalizing responses over time. | "User tends to ask technical questions about Python and data engineering." | Starts low (0.4–0.6) as an inference, rises with accumulating evidence. Analysis entries are updated in place rather than duplicated as the pattern strengthens. |
Confidence scores are not static. The LLM may update a memory's confidence upward when new information corroborates it, or downward when the user contradicts or corrects a stored entry. Contradicted entries are not automatically deleted — they are updated with lower confidence and a note about the correction, preserving the history.
When memories are ranked for injection into the system prompt, confidence acts as a multiplier on the relevance score. A memory that is highly relevant but has low confidence (e.g. an uncertain inference) will rank below a moderately relevant memory with high confidence. This means well-established facts are surfaced reliably, while speculative analysis appears only when it is also strongly relevant to the current query.
Memory Tools
The LLM has five built-in memory tools available at all times during conversation. These tools are called automatically — the user does not need to ask ScalyClaw to remember something, and the user does not need to know the tools exist. The LLM decides when to call them based on what it determines is worth preserving or retrieving.
memory_store
Saves a new memory entry to the store. The LLM calls this when it identifies information in the conversation that is worth preserving for future use. If an entry with very similar content already exists, the tool updates the existing entry and adjusts its confidence rather than creating a duplicate.
// Example tool call — storing a preference inferred from conversation { "name": "memory_store", "arguments": { "content": "User prefers code examples in TypeScript over JavaScript.", "type": "preference", "tags": ["coding", "typescript"], "confidence": 0.85 } }
// Example tool call — storing a fact stated explicitly by the user { "name": "memory_store", "arguments": { "content": "User works at Acme Corp as a senior backend engineer.", "type": "fact", "tags": ["work", "employment"], "confidence": 1.0 } }
memory_search
Queries the memory store on demand. While retrieval is automatic before every response, the LLM may call memory_search mid-conversation when the user asks a question that might be answered from memory, or when the LLM wants to double-check what it knows before making a claim.
// Example tool call — searching for information about the user's work situation { "name": "memory_search", "arguments": { "query": "user employer job role company", "limit": 5 } }
// Example tool response — results returned to the LLM { "results": [ { "id": "mem_01j9x4kzb2f3", "type": "fact", "content": "User works at Acme Corp as a senior backend engineer.", "confidence": 1.0, "score": 0.94, "createdAt": "2026-01-15T09:22:41Z", "updatedAt": "2026-02-03T14:08:17Z" } ] }
Tool Parameters
| Tool | Parameter | Type | Description |
|---|---|---|---|
memory_store |
content |
string | The memory statement. Should be a complete, self-contained sentence that would make sense without any surrounding context. |
type |
string | Memory type: fact, preference, relationship, event, or analysis. |
|
tags |
string[] | Array of topic strings for filtering and organization. Used by the dashboard memory browser. | |
confidence |
number | Certainty score from 0.0 to 1.0. 1.0 means the user stated it directly; lower values represent inferences of varying certainty. | |
ttl |
optional | Time-to-live in seconds. When set, the memory entry is automatically expired after this duration. | |
memory_search |
query |
string | Natural-language search query. Can be a phrase, a list of keywords, or a full sentence describing what to look for. |
limit |
optional | Maximum number of results to return. Defaults to 10. | |
memory_recall |
id / type / tags |
optional | Browse memories by a specific ID, by type, or by tag. Useful for structured lookups when the LLM already knows what category of memory it needs. |
memory_update |
subject, content, tags, confidence |
optional | Update one or more fields of an existing memory entry. Only the fields provided are changed; the rest remain unchanged. |
memory_delete |
id |
string | Permanently removes the memory entry with the given ID from the store. |
Memories are not automatically purged when they become stale. An event memory for "User has a meeting on Friday" will still exist after that Friday has passed — it becomes a historical record rather than an active reminder. The LLM is instructed to treat dated events with appropriate temporal awareness, but if you want guaranteed cleanup you should delete the entry manually via the dashboard or by telling ScalyClaw to forget it.
Memory Management
The dashboard Memory page gives you full visibility into everything ScalyClaw has learned. It is the canonical interface for reviewing, editing, and pruning the memory store.
Browsing Memories
The memory browser displays all stored entries in a sortable, filterable list. Each row shows the memory content, its type badge, confidence score, and timestamps for when it was created and last updated. You can:
- Search by keyword or phrase across all memory content — uses the same FTS5 index as the retrieval pipeline.
- Filter by type to view only facts, preferences, relationships, events, or analysis entries.
- Filter by confidence to surface high-confidence entries or to find uncertain inferences that may need review.
- Sort by creation date, last-updated date, or confidence score.
Manual Operations
All memory operations available to the LLM are also available to you directly in the dashboard:
| Operation | Where | Description |
|---|---|---|
| Store new memory | Memory page → New entry | Create a memory entry manually. Useful for bootstrapping — you can seed facts about yourself before any conversations take place. The entry is embedded and indexed immediately. |
| Edit entry | Memory page → Edit | Modify the content, type, or confidence of an existing memory. The vector embedding is automatically regenerated from the updated content. |
| Delete entry | Memory page → Delete | Permanently removes the memory from both the SQLite table and the vector index. This action cannot be undone from the dashboard — the entry is gone. |
| Bulk delete | Memory page → Select → Delete selected | Select multiple entries with checkboxes and delete them in one action. Useful for cleaning up a batch of stale events or low-confidence inferences. |
| Re-embed all | Memory page → Settings → Re-embed all | Regenerates vector embeddings for all entries using the currently configured embedding model. Required after changing embedding models. Can take several minutes for large stores. |
Asking ScalyClaw to Forget
You do not need to visit the dashboard to remove a memory. If you tell ScalyClaw during any conversation "forget that I work at Acme Corp" or "please don't remember anything about my schedule", it will search for matching entries and call its internal memory_delete tool to remove them. The deletion takes effect immediately — the entries are gone from the store before the next conversation turn.
// Tool call issued by the LLM when user asks it to forget something { "name": "memory_delete", "arguments": { "id": "mem_01j9x4kzb2f3" } }
Export and Backup
The memory store is an ordinary SQLite database file on disk. You can back it up, copy it, or inspect it with any SQLite browser. The dashboard also provides an export function that serializes all entries to a JSON array, suitable for archiving or migrating to another ScalyClaw instance.
The memory store is global, not per-channel. A fact learned during a Telegram conversation is immediately available in a Discord conversation, and vice versa. There is one unified knowledge base for the entire ScalyClaw instance. If you run multiple separate ScalyClaw deployments for different users, each deployment has its own independent memory store.