The lifecycle of a message
A single turn moves through four stages. The first three (ingest, extract, store) run asynchronously after you record content; the fourth (retrieve) happens at the start of each turn to build the context your agent reasons over.Ingest
You record the turn with
sdk.conversation.record_message() and/or submit content for durable memory with sdk.memories.create() (or sdk.memories.batch_create() for bulk loads). The scope identifiers you pass — user_id, optional customer_id — determine where the resulting memories live.Extract
The content runs through a multi-stage pipeline: categorization, memory extraction (facts, preferences, episodes, emotions, temporal events), chunking, entity resolution, and organization. Each stage enriches the memory with metadata that improves retrieval later.
Store
Processed memories are persisted in two complementary engines — a vector store for semantic similarity and a graph store for entity relationships — scoped to the right level (user, customer, client, or world).
conversation_id must be a valid UUID. Generate one with str(uuid.uuid4()) and reuse the same value for every turn — and every compaction call — in the same conversation.The four context layers
Context in Synap is organized into four layers. They are not separate systems — they are the same memories stored at different scopes, with different lifespans and read paths.| Layer | Scope | Identifier(s) | What it holds | How you read it |
|---|---|---|---|---|
| Short-term / conversational | Single conversation | conversation_id | The running transcript of this session — turns, decisions, current state | sdk.conversation.context.fetch() |
| Long-term (user) | One end user | user_id (+ customer_id on B2B) | Durable facts, preferences, episodes about a person | sdk.user.context.fetch() |
| Customer | One tenant | customer_id | Policies, team structure, shared projects for a B2B organization | sdk.customer.context.fetch() |
| Organizational | Your whole app | (none) | Product docs, announcements, domain knowledge for every user | sdk.client.context.fetch() |
customer_id is required only on B2B (multi-tenant) instances. On B2C instances the customer is auto-resolved from user_id, so you can omit it. The examples below use user_id and note where customer_id applies.Short-term context
Short-term context is the accumulated history of a single conversation — the questions asked, answers given, and decisions made so far. It is what lets your agent say “as I mentioned earlier…” without losing track of the thread. It lives only for the duration of the session.Registering the conversation
Short-term context does not appear by magic. Each turn must be registered withsdk.conversation.record_message() — both the user and assistant roles — so Synap can build conversation-scoped context and feed compaction.
How the context grows
A “turn” is a user message plus its assistant response. Each turn is appended to the running history, and your agent sees the full history on every subsequent turn:Why it can’t grow forever
Short-term context is bounded by three practical constraints, which is why compaction exists.Token limits
Every LLM has a maximum context window. Filling it with raw conversation history leaves little room for retrieved long-term memories and system instructions.
Cost scaling
LLM cost scales with input tokens. Unbounded history makes every turn progressively more expensive.
Quality degradation
LLMs pay less attention to the middle of long contexts (the “lost in the middle” effect), so very long histories can actually degrade answer quality.
sdk.memories.create() — there is no explicit “end” call; you ingest what you want to remember whenever it is ready. That hands off to the long-term layer below.
Long-term context
Long-term context is the persistent knowledge layer — durable facts, preferences, and events that survive across sessions for days, weeks, or years. It is what gives your agent a memory that lasts: it knows Alice prefers concise summaries even if that was learned months ago.Lifecycle: from raw content to durable memory
Ingestion — content enters
Content arrives via
sdk.memories.create() (runtime, as conversations happen) or sdk.memories.batch_create() (bulk imports and backfills — see Bootstrap Ingestion). At this point it is raw text with scope identifiers and an optional document_id.Processing — the multi-stage pipeline
Raw text becomes structured, queryable memory through categorization → extraction → chunking → entity resolution → organization. Extraction sorts content into the five memory types: facts, preferences, episodes, emotions, and temporal events. Entity resolution links mentions (“Alice,” “Alice Chen,” “A. Chen”) to a single canonical entity in the entity registry, creating the graph edges that power relationship queries.
Storage — dual-store persistence
Memories land in both engines, each scoped immutably by the identifiers present at ingestion.Scope is set by which identity fields are present, and cannot change after storage:
Vector store
Memory chunks are embedded for semantic similarity search — finding relevant memories even without shared keywords.
Graph store
Entity relationships are stored for traversal: “what do we know about this customer’s team?” follows graph edges to connected memories.
| Identifiers at ingestion | Resulting scope |
|---|---|
user_id (+ customer_id on B2B) | USER |
customer_id only | CUSTOMER |
| neither | CLIENT |
Active retrieval — serving queries
When the agent needs context, the retrieval engine embeds the query, searches the vector store, traverses the graph, merges results across all applicable scopes (USER + CUSTOMER + CLIENT + WORLD), ranks them, and returns the top results within the token budget. Frequently surfaced memories stay prominent; rarely surfaced ones gradually deprioritize.
Aging and retention
Ranking weighs relevance, recency, and confidence, so older, less-relevant memories naturally give way to current information — no manual cleanup required. How aggressively memories age, and how long they are retained, is governed by your Memory Architecture Configuration; Synap derives sensible defaults from your use-case file.
Retrieval, scope, and ranking
The retrieval engine searches the full scope chain and prefers the narrowest applicable scope:fast— vector + graph search, tuned for low-latency interactive turns.accurate— vector + graph plus LLM subquery decomposition and reranking, for deeper, higher-recall processing.
Customer context
Customer context is knowledge stored at the CUSTOMER scope: shared across all users within one B2B tenant, but invisible to other tenants. It is each customer’s internal wiki — policies, team structure, shared projects, and domain terminology.Lifecycle: ingest with customer_id, no user_id
You create customer context by ingesting with a customer_id but no user_id. That single distinction is what places the memory at the customer scope.
How you read it back
Retrieve customer context directly, or let it surface automatically inside user conversations:- Direct (customer scope and below)
- Inside a user conversation
| Memory | Source scope | Alice | Bob | Carol |
|---|---|---|---|---|
| ”Auth migration to OAuth 2.1 by Q2” | CUSTOMER | Yes | Yes | Yes |
| ”Alice prefers Slack for notifications” | USER (Alice) | Yes | No | No |
| ”Bob is on the Platform team” | USER (Bob) | No | Yes | No |
| ”Product supports OAuth 2.0 and 2.1” | CLIENT | Yes | Yes | Yes |
Organizational context
Organizational context is knowledge at the CLIENT scope — the broadest application-level scope. It is your product’s documentation, changelog, global policies, and domain knowledge, available to every user across every customer. Think of it as the product brain beneath all customer- and user-specific memories.Lifecycle: ingest with no scope identifiers
Org context enters when you ingest withoutuser_id or customer_id. For initial product-knowledge loads, batch_create is the recommended path — higher throughput, and processing a documentation set together improves cross-document entity resolution.
Updates, caching, and idempotency
Re-ingest a changed document with the samedocument_id to update it idempotently: the old version is replaced, reprocessed, and entity connections are refreshed. Use a stable naming convention like doc_<category>_<name>_v<version>, and for frequently changing sources (pricing, feature lists) schedule periodic re-ingestion from your source of truth.
Because org knowledge is read-heavy and write-infrequent, client-scope retrieval results are cached with a 30-minute TTL. This lowers latency and reduces load on the stores; the trade-off is that after an update, changes may take up to the TTL window to propagate everywhere.
| Aspect | Detail |
|---|---|
| Cache TTL | 30 minutes |
| Why cache | Org context is read-heavy, write-infrequent |
| Invalidation | Re-ingesting (same document_id) refreshes affected entries within the next TTL window |
How it surfaces in retrieval
Org context is merged into every user query at the lowest priority, beneath user and customer memories:Context compaction
Compaction solves the short-term growth problem from the other side: when a conversation gets long, sending the full transcript to your LLM becomes expensive and eventually hits the context window. Compaction intelligently compresses the history — preserving key facts, decisions, preferences, and current state — instead of blindly truncating it.Analyze
The engine reads the full transcript and identifies facts, decisions, preferences, emotional shifts, and where the discussion currently stands.
Extract
It pulls out five categories of essential information: facts, decisions, preferences, a summary narrative of the conversation arc, and the current state (active topic and open questions).
Compress
The extracted information is compressed into your target token budget. Recent turns are preserved verbatim for conversational flow; older, resolved turns become summaries. A
validation_score is computed so you can confirm critical information survived.Strategies
| Strategy | Output size | Best for |
|---|---|---|
conservative | Largest | Short conversations needing high detail; minimal information loss. |
balanced | Medium | General-purpose; good compression-vs-detail balance. |
aggressive | Smallest | Long or cost-sensitive conversations; keeps only the most critical facts. |
adaptive | Varies | Synap analyzes the conversation (length, density, repetition, recency, budget) and picks the strategy. Recommended default. |
The SDK surface
Compaction is asynchronous:compact kicks off a job and returns a handle, get_compaction_status polls for completion, and get_compacted returns the result.
validation_score (0.0–1.0) and validation_passed to confirm quality. If scores fall consistently low, switch to a less aggressive strategy or raise the token budget.
Compaction vs. retrieval
The two are complementary, not interchangeable:| Aspect | Compaction | Retrieval |
|---|---|---|
| Input | Current conversation history | Query against stored memories |
| Scope | One conversation | All memories across all conversations |
| Purpose | Reduce tokens for the current turn | Bring relevant past knowledge into the turn |
| Output | Compressed view of this conversation | Ranked memories from vector + graph stores |
Next steps
Memories & Context
The overview of how memories and context fit together in Synap.
Retrieval Modes
Choosing between
fast (vector + graph) and accurate (vector + graph + LLM decomposition + reranking).Entity Resolution
How mentions are resolved to canonical entities and linked in the graph.
Memory Scopes
The full scope chain and priority resolution rules.