Skip to main content
Every message your agent handles flows through the same arc: it is ingested, its meaning is extracted into structured memories, those memories are stored in the vector and graph engines, and they are retrieved to enrich the next turn. Around that arc sit four context layers, each with a different scope and lifespan — from the working memory of a single conversation to knowledge shared across your entire application. This page answers two questions in one place: what happens to a message after I send it? and what context can I read back? You should not need three tabs open to follow a conversation from the first user turn to the durable knowledge it leaves behind.
Think of short-term context as working memory during a meeting — everything said so far — and long-term context as the takeaways that persist after the meeting ends. Customer and organizational context are the shared wikis and product brain that everyone in the room already knows.

The lifecycle of a message

A single turn moves through four stages. The first three (ingest, extract, store) run asynchronously after you record content; the fourth (retrieve) happens at the start of each turn to build the context your agent reasons over.
1

Ingest

You record the turn with sdk.conversation.record_message() and/or submit content for durable memory with sdk.memories.create() (or sdk.memories.batch_create() for bulk loads). The scope identifiers you pass — user_id, optional customer_id — determine where the resulting memories live.
2

Extract

The content runs through a multi-stage pipeline: categorization, memory extraction (facts, preferences, episodes, emotions, temporal events), chunking, entity resolution, and organization. Each stage enriches the memory with metadata that improves retrieval later.
3

Store

Processed memories are persisted in two complementary engines — a vector store for semantic similarity and a graph store for entity relationships — scoped to the right level (user, customer, client, or world).
4

Retrieve

On the next turn, context.fetch() searches the applicable scopes, ranks the results, and returns the most relevant memories within your token budget. This retrieved context, plus the conversation’s short-term history, is what your agent reasons over.
record_message / memories.create
        │  ingest

  multi-stage pipeline ──► vector store + graph store
        │  extract                 │  store
        ▼                          ▼
   structured memories      context.fetch() ──► ranked context for the next turn
                                   │  retrieve

                          your agent's prompt
conversation_id must be a valid UUID. Generate one with str(uuid.uuid4()) and reuse the same value for every turn — and every compaction call — in the same conversation.

The four context layers

Context in Synap is organized into four layers. They are not separate systems — they are the same memories stored at different scopes, with different lifespans and read paths.
LayerScopeIdentifier(s)What it holdsHow you read it
Short-term / conversationalSingle conversationconversation_idThe running transcript of this session — turns, decisions, current statesdk.conversation.context.fetch()
Long-term (user)One end useruser_id (+ customer_id on B2B)Durable facts, preferences, episodes about a personsdk.user.context.fetch()
CustomerOne tenantcustomer_idPolicies, team structure, shared projects for a B2B organizationsdk.customer.context.fetch()
OrganizationalYour whole app(none)Product docs, announcements, domain knowledge for every usersdk.client.context.fetch()
customer_id is required only on B2B (multi-tenant) instances. On B2C instances the customer is auto-resolved from user_id, so you can omit it. The examples below use user_id and note where customer_id applies.
Narrower scopes win. When the same fact exists at multiple levels, the user-scoped version takes priority over customer, which takes priority over client. See Memory Scopes for the full priority resolution rules.

Short-term context

Short-term context is the accumulated history of a single conversation — the questions asked, answers given, and decisions made so far. It is what lets your agent say “as I mentioned earlier…” without losing track of the thread. It lives only for the duration of the session.

Registering the conversation

Short-term context does not appear by magic. Each turn must be registered with sdk.conversation.record_message() — both the user and assistant roles — so Synap can build conversation-scoped context and feed compaction.
import uuid
from maximem_synap import MaximemSynapSDK

sdk = MaximemSynapSDK(api_key="synap_your_key_here")
await sdk.initialize()

# One UUID per conversation, reused across every turn
conversation_id = str(uuid.uuid4())

await sdk.conversation.record_message(
    conversation_id=conversation_id,
    role="user",
    content="I prefer dark mode and concise answers.",
    user_id="user_alice",
    # customer_id="customer_acme",  # required on B2B; omit on B2C (auto-resolved)
)

await sdk.conversation.record_message(
    conversation_id=conversation_id,
    role="assistant",
    content="Got it — I'll keep answers short and assume dark mode.",
    user_id="user_alice",
)
If a conversation is never registered with record_message, a later conversation.context.fetch() for that conversation_id returns empty — there is no transcript to draw on, and memories_used stays 0. Registering each turn is what makes the conversation coherent on the next fetch.

How the context grows

A “turn” is a user message plus its assistant response. Each turn is appended to the running history, and your agent sees the full history on every subsequent turn:
Turn 1:  User: "What's our current API rate limit?"
         Assistant: "Your current rate limit is 1,000 requests per minute."

Turn 2:  User: "Can we increase that for our enterprise plan?"
         Assistant: "Yes, enterprise plans support up to 10,000 req/min..."

Turn 3:  User: "What about burst handling?"
         Assistant: "Burst allowances provide a 2x multiplier..."
  ...

Why it can’t grow forever

Short-term context is bounded by three practical constraints, which is why compaction exists.

Token limits

Every LLM has a maximum context window. Filling it with raw conversation history leaves little room for retrieved long-term memories and system instructions.

Cost scaling

LLM cost scales with input tokens. Unbounded history makes every turn progressively more expensive.

Quality degradation

LLMs pay less attention to the middle of long contexts (the “lost in the middle” effect), so very long histories can actually degrade answer quality.
At the start of each turn, long-term memories are retrieved to provide background, while the short-term transcript provides immediate continuity. The two paths converge in your prompt:
context = await sdk.conversation.context.fetch(
    conversation_id=conversation_id,
    search_query=["migration timeline"],
)
# context.facts / context.preferences / context.episodes hold the
# long-term memories retrieved for this turn; the short-term transcript
# supplies the in-session continuity.
When the transcript content has lasting value, you persist it to long-term memory with sdk.memories.create() — there is no explicit “end” call; you ingest what you want to remember whenever it is ready. That hands off to the long-term layer below.

Long-term context

Long-term context is the persistent knowledge layer — durable facts, preferences, and events that survive across sessions for days, weeks, or years. It is what gives your agent a memory that lasts: it knows Alice prefers concise summaries even if that was learned months ago.

Lifecycle: from raw content to durable memory

1

Ingestion — content enters

Content arrives via sdk.memories.create() (runtime, as conversations happen) or sdk.memories.batch_create() (bulk imports and backfills — see Bootstrap Ingestion). At this point it is raw text with scope identifiers and an optional document_id.
await sdk.memories.create(
    document="The customer prefers email communication over phone calls.",
    document_type="ai-chat-conversation",
    user_id="user_alice",
    # customer_id="customer_acme",  # B2B only
    metadata={"source": "support_conversation"},
)
2

Processing — the multi-stage pipeline

Raw text becomes structured, queryable memory through categorization → extraction → chunking → entity resolution → organization. Extraction sorts content into the five memory types: facts, preferences, episodes, emotions, and temporal events. Entity resolution links mentions (“Alice,” “Alice Chen,” “A. Chen”) to a single canonical entity in the entity registry, creating the graph edges that power relationship queries.
3

Storage — dual-store persistence

Memories land in both engines, each scoped immutably by the identifiers present at ingestion.

Vector store

Memory chunks are embedded for semantic similarity search — finding relevant memories even without shared keywords.

Graph store

Entity relationships are stored for traversal: “what do we know about this customer’s team?” follows graph edges to connected memories.
Scope is set by which identity fields are present, and cannot change after storage:
Identifiers at ingestionResulting scope
user_id (+ customer_id on B2B)USER
customer_id onlyCUSTOMER
neitherCLIENT
4

Active retrieval — serving queries

When the agent needs context, the retrieval engine embeds the query, searches the vector store, traverses the graph, merges results across all applicable scopes (USER + CUSTOMER + CLIENT + WORLD), ranks them, and returns the top results within the token budget. Frequently surfaced memories stay prominent; rarely surfaced ones gradually deprioritize.
5

Aging and retention

Ranking weighs relevance, recency, and confidence, so older, less-relevant memories naturally give way to current information — no manual cleanup required. How aggressively memories age, and how long they are retained, is governed by your Memory Architecture Configuration; Synap derives sensible defaults from your use-case file.
6

Eviction — end of the lifecycle

When a memory is no longer retained it is archived (moved to cold storage, reachable only by explicit archive queries — good for compliance) or deleted (permanently removed from both stores, with entity connections cleaned up), depending on your configuration.

Retrieval, scope, and ranking

The retrieval engine searches the full scope chain and prefers the narrowest applicable scope:
USER scope     →  Alice's personal memories (highest priority)
CUSTOMER scope →  Acme Corp's shared knowledge
CLIENT scope   →  your application's product knowledge
WORLD scope    →  global domain knowledge (lowest priority)
context = await sdk.user.context.fetch(
    user_id="user_alice",
    # customer_id="customer_acme",  # B2B only
    search_query=["project timeline", "Q2 deliverables"],
    types=["facts", "temporal_events"],  # restrict to specific memory types
)

# context.facts → "Q2 roadmap includes API v3 launch and dashboard redesign"
# context.temporal_events → "API v3 launch deadline: June 15"
Two retrieval modes trade speed for depth:
  • fast — vector + graph search, tuned for low-latency interactive turns.
  • accurate — vector + graph plus LLM subquery decomposition and reranking, for deeper, higher-recall processing.
See Retrieval Modes for how to choose. Long-term memory is cumulative and self-managing: as it grows, entity resolution sharpens, retrieval gets richer, and ranking ensures only the most relevant memories surface regardless of total volume. Long-term context has two shared sub-layers based on scope — customer and organizational — covered next.

Customer context

Customer context is knowledge stored at the CUSTOMER scope: shared across all users within one B2B tenant, but invisible to other tenants. It is each customer’s internal wiki — policies, team structure, shared projects, and domain terminology.

Lifecycle: ingest with customer_id, no user_id

You create customer context by ingesting with a customer_id but no user_id. That single distinction is what places the memory at the customer scope.
await sdk.memories.create(
    document="""
    Acme Corp Engineering Handbook
    - All services must use Python 3.11 or later
    - Production deployments: Tuesdays and Thursdays, 10am-2pm PT
    - Hotfix deployments require VP Engineering approval
    """,
    document_type="document",
    customer_id="customer_acme",
    # No user_id — shared across all users at this customer
)
Do not accidentally include a user_id when ingesting customer-wide documents. With a user_id, the memory drops to the User scope and becomes visible to only that one person, defeating the purpose of shared tenant knowledge.
It then flows through the same processing and storage pipeline as any long-term memory, scoped to the customer. Retention and aging follow your configuration.

How you read it back

Retrieve customer context directly, or let it surface automatically inside user conversations:
context = await sdk.customer.context.fetch(
    customer_id="customer_acme",
    search_query=["deployment process", "production releases"],
)
# Returns CUSTOMER + CLIENT + WORLD scopes (no USER scope):
# - "Production deployments: Tues/Thurs 10am-2pm PT" (CUSTOMER)
# - "Platform supports blue-green deployment strategy" (CLIENT)
The payoff is shared knowledge for every user in the tenant. If Alice ingests sprint planning notes at customer scope, Bob and Carol both see them on their next fetch — while each still has their own user-scoped memories.
MemorySource scopeAliceBobCarol
”Auth migration to OAuth 2.1 by Q2”CUSTOMERYesYesYes
”Alice prefers Slack for notifications”USER (Alice)YesNoNo
”Bob is on the Platform team”USER (Bob)NoYesNo
”Product supports OAuth 2.0 and 2.1”CLIENTYesYesYes

Organizational context

Organizational context is knowledge at the CLIENT scope — the broadest application-level scope. It is your product’s documentation, changelog, global policies, and domain knowledge, available to every user across every customer. Think of it as the product brain beneath all customer- and user-specific memories.

Lifecycle: ingest with no scope identifiers

Org context enters when you ingest without user_id or customer_id. For initial product-knowledge loads, batch_create is the recommended path — higher throughput, and processing a documentation set together improves cross-document entity resolution.
from maximem_synap import CreateMemoryRequest

# Single doc — no user_id or customer_id = CLIENT scope
await sdk.memories.create(
    document="Our standard SLA guarantees 99.9% uptime...",
    document_type="document",
    document_id="doc_sla_v2",
)

# Bulk load product documentation
documents = [
    CreateMemoryRequest(document=open("docs/api-reference.md").read(), document_type="document"),
    CreateMemoryRequest(document=open("docs/changelog-v3.md").read(), document_type="document"),
]
await sdk.memories.batch_create(documents=documents)
It goes through the same pipeline as user memories, but all entity resolution and storage happen at CLIENT scope — so when a user later mentions “Product X,” the system resolves it against the entity registered from your docs, connecting their question to the right documentation.
CLIENT-scope memories are accessible to all users of your application. Do not store sensitive internal documents (HR, financial, executive communications) as org context unless your app is for internal use only.

Updates, caching, and idempotency

Re-ingest a changed document with the same document_id to update it idempotently: the old version is replaced, reprocessed, and entity connections are refreshed. Use a stable naming convention like doc_<category>_<name>_v<version>, and for frequently changing sources (pricing, feature lists) schedule periodic re-ingestion from your source of truth. Because org knowledge is read-heavy and write-infrequent, client-scope retrieval results are cached with a 30-minute TTL. This lowers latency and reduces load on the stores; the trade-off is that after an update, changes may take up to the TTL window to propagate everywhere.
AspectDetail
Cache TTL30 minutes
Why cacheOrg context is read-heavy, write-infrequent
InvalidationRe-ingesting (same document_id) refreshes affected entries within the next TTL window

How it surfaces in retrieval

Org context is merged into every user query at the lowest priority, beneath user and customer memories:
User Query: "What is the refund policy?"


  1. USER scope     → "Alice has a VIP 60-day return window"   ← highest priority
  2. CUSTOMER scope → "Acme Corp negotiated 45-day returns"    ← medium priority
  3. CLIENT scope   → "Standard refund policy: 30 days"        ← lowest priority


  Agent receives all three, ranked. A well-designed prompt prefers
  the most specific (user) answer over the general (org) baseline.
Org context is the knowledge baseline that narrower scopes can override. When budget is tight, narrower-scope memories are preserved first and org context is trimmed if necessary.

Context compaction

Compaction solves the short-term growth problem from the other side: when a conversation gets long, sending the full transcript to your LLM becomes expensive and eventually hits the context window. Compaction intelligently compresses the history — preserving key facts, decisions, preferences, and current state — instead of blindly truncating it.
1

Analyze

The engine reads the full transcript and identifies facts, decisions, preferences, emotional shifts, and where the discussion currently stands.
2

Extract

It pulls out five categories of essential information: facts, decisions, preferences, a summary narrative of the conversation arc, and the current state (active topic and open questions).
3

Compress

The extracted information is compressed into your target token budget. Recent turns are preserved verbatim for conversational flow; older, resolved turns become summaries. A validation_score is computed so you can confirm critical information survived.
4

Persist what lasts

Information with durable value is also routed through the ingestion pipeline into long-term memory, so knowledge from the conversation is not lost when the short-term context is compressed.
Compaction is lossy by design. For conversations where every nuance matters (legal, medical, financial), keep the full history and use compaction only for supplementary context.

Strategies

StrategyOutput sizeBest for
conservativeLargestShort conversations needing high detail; minimal information loss.
balancedMediumGeneral-purpose; good compression-vs-detail balance.
aggressiveSmallestLong or cost-sensitive conversations; keeps only the most critical facts.
adaptiveVariesSynap analyzes the conversation (length, density, repetition, recency, budget) and picks the strategy. Recommended default.

The SDK surface

Compaction is asynchronous: compact kicks off a job and returns a handle, get_compaction_status polls for completion, and get_compacted returns the result.
import asyncio
import uuid

conversation_id = str(uuid.uuid4())  # reuse this conversation's UUID

# Kick off compaction (fire-and-forget — returns a trigger handle)
trigger = await sdk.conversation.context.compact(
    conversation_id=conversation_id,
    strategy="adaptive",
    target_tokens=2000,
)
print(f"Compaction {trigger.compaction_id} status: {trigger.status}")

# Poll until the run completes
while True:
    status = await sdk.conversation.context.get_compaction_status(
        conversation_id=conversation_id,
    )
    if status.status in ("completed", "failed"):
        break
    await asyncio.sleep(2)

# Read the compacted result
result = await sdk.conversation.context.get_compacted(conversation_id=conversation_id)
print(f"Compressed {result.original_token_count} -> {result.compacted_token_count} tokens")
print(f"Quality: {result.validation_score:.2f}, passed: {result.validation_passed}")
print(f"Facts: {result.facts}")
print(f"Current state: {result.current_state}")
Inspect validation_score (0.0–1.0) and validation_passed to confirm quality. If scores fall consistently low, switch to a less aggressive strategy or raise the token budget.

Compaction vs. retrieval

The two are complementary, not interchangeable:
AspectCompactionRetrieval
InputCurrent conversation historyQuery against stored memories
ScopeOne conversationAll memories across all conversations
PurposeReduce tokens for the current turnBring relevant past knowledge into the turn
OutputCompressed view of this conversationRanked memories from vector + graph stores
A typical production turn does both: retrieve relevant long-term memories, compact the current conversation if it is long, then combine retrieved memories + compacted summary + recent verbatim turns into the prompt.

Next steps

Memories & Context

The overview of how memories and context fit together in Synap.

Retrieval Modes

Choosing between fast (vector + graph) and accurate (vector + graph + LLM decomposition + reranking).

Entity Resolution

How mentions are resolved to canonical entities and linked in the graph.

Memory Scopes

The full scope chain and priority resolution rules.