Context, End to End - Maximem Synap

Every message your agent handles flows through the same arc: it is ingested, its meaning is extracted into structured memories, those memories are stored in the vector and graph engines, and they are retrieved to enrich the next turn. Around that arc sit four context layers, each with a different scope and lifespan, from the working memory of a single conversation to knowledge shared across your entire application. This page answers two questions in one place: what happens to a message after I send it? and what context can I read back? You should not need three tabs open to follow a conversation from the first user turn to the durable knowledge it leaves behind.

Think of short-term context as working memory during a meeting (everything said so far) and long-term context as the takeaways that persist after the meeting ends. Customer and organizational context are the shared wikis and product brain that everyone in the room already knows.

The lifecycle of a message

A single turn moves through four stages. The first three (ingest, extract, store) run asynchronously after you record content; the fourth (retrieve) happens at the start of each turn to build the context your agent reasons over.

Ingest

You record the turn with sdk.conversation.record_message() and/or submit content for durable memory with sdk.memories.create() (or sdk.memories.batch_create() for bulk loads). The scope identifiers you pass (user_id, optional customer_id) determine where the resulting memories live.

Extract

The content runs through a multi-stage pipeline: categorization, memory extraction (facts, preferences, episodes, emotions, temporal events), chunking, entity resolution, and organization. Each stage enriches the memory with metadata that improves retrieval later. For the full pipeline (stages, ingestion modes, document types, and the runtime vs bootstrap paths) see How Ingestion Works.

Store

Processed memories are persisted in two complementary engines (a vector store for semantic similarity and a graph store for entity relationships) scoped to the right level (user, customer, client, or world).

Retrieve

On the next turn, context.fetch() searches the applicable scopes, ranks the results, and returns the most relevant memories within your token budget. This retrieved context, plus the conversation’s short-term history, is what your agent reasons over.

record_message / memories.create
        │  ingest
        ▼
  multi-stage pipeline ──► vector store + graph store
        │  extract                 │  store
        ▼                          ▼
   structured memories      context.fetch() ──► ranked context for the next turn
                                   │  retrieve
                                   ▼
                          your agent's prompt

conversation_id must be a valid UUID. Generate one with str(uuid.uuid4()) and reuse the same value for every turn (and every compaction call) in the same conversation.

The four context layers

Context in Synap is organized into four layers. They are not separate systems: they are the same memories stored at different scopes, with different lifespans and read paths.

Layer	Scope	Identifier(s)	What it holds	How you read it
Short-term / conversational	Single conversation	`conversation_id`	The running transcript of this session: turns, decisions, current state	`sdk.conversation.context.fetch()`
Long-term (user)	One end user	`user_id` (+ `customer_id` on B2B)	Durable facts, preferences, episodes about a person	`sdk.user.context.fetch()`
Customer	One tenant	`customer_id`	Policies, team structure, shared projects for a B2B organization	`sdk.customer.context.fetch()`
Organizational	Your whole app	(none)	Product docs, announcements, domain knowledge for every user	`sdk.client.context.fetch()`

customer_id is required only on B2B (multi-tenant) instances. On B2C instances the customer is auto-resolved from user_id, so you can omit it. The examples below use user_id and note where customer_id applies.

Narrower scopes win. When the same fact exists at multiple levels, the user-scoped version takes priority over customer, which takes priority over client. See Memory Scopes for the full priority resolution rules.

Short-term context

Short-term context is the accumulated history of a single conversation: the questions asked, answers given, and decisions made so far. It is what lets your agent say “as I mentioned earlier…” without losing track of the thread. It lives only for the duration of the session.

Registering the conversation

Short-term context does not appear by magic. Each turn must be registered with sdk.conversation.record_message() (both the user and assistant roles) so Synap can build conversation-scoped context and feed compaction.

import uuid
from maximem_synap import MaximemSynapSDK

sdk = MaximemSynapSDK(api_key="synap_your_key_here")
await sdk.initialize()

# One UUID per conversation, reused across every turn
conversation_id = str(uuid.uuid4())

await sdk.conversation.record_message(
    conversation_id=conversation_id,
    role="user",
    content="I prefer dark mode and concise answers.",
    user_id="user_alice",
    # customer_id="customer_acme",  # required on B2B; omit on B2C (auto-resolved)
)

await sdk.conversation.record_message(
    conversation_id=conversation_id,
    role="assistant",
    content="Got it. I'll keep answers short and assume dark mode.",
    user_id="user_alice",
)

If a conversation is never registered with record_message, a later conversation.context.fetch() for that conversation_id returns empty: there is no transcript to draw on, and memories_used stays 0. Registering each turn is what makes the conversation coherent on the next fetch.

How the context grows

A “turn” is a user message plus its assistant response. Each turn is appended to the running history, and your agent sees the full history on every subsequent turn:

Turn 1:  User: "What's our current API rate limit?"
         Assistant: "Your current rate limit is 1,000 requests per minute."

Turn 2:  User: "Can we increase that for our enterprise plan?"
         Assistant: "Yes, enterprise plans support up to 10,000 req/min..."

Turn 3:  User: "What about burst handling?"
         Assistant: "Burst allowances provide a 2x multiplier..."
  ...

Why it can’t grow forever

Short-term context is bounded by three practical constraints, which is why compaction exists.

Token limits

Every LLM has a maximum context window. Filling it with raw conversation history leaves little room for retrieved long-term memories and system instructions.

Cost scaling

LLM cost scales with input tokens. Unbounded history makes every turn progressively more expensive.

Quality degradation

LLMs pay less attention to the middle of long contexts (the “lost in the middle” effect), so very long histories can actually degrade answer quality.

At the start of each turn, long-term memories are retrieved to provide background, while the short-term transcript provides immediate continuity. The two paths converge in your prompt:

context = await sdk.conversation.context.fetch(
    conversation_id=conversation_id,
    search_query=["migration timeline"],
)
# context.facts / context.preferences / context.episodes hold the
# long-term memories retrieved for this turn; the short-term transcript
# supplies the in-session continuity.

When the transcript content has lasting value, you persist it to long-term memory with sdk.memories.create(): there is no explicit “end” call; you ingest what you want to remember whenever it is ready. That hands off to the long-term layer below.

Long-term context

Long-term context is the persistent knowledge layer: durable facts, preferences, and events that survive across sessions for days, weeks, or years. It is what gives your agent a memory that lasts: it knows Alice prefers concise summaries even if that was learned months ago.

Lifecycle: from raw content to durable memory

Ingestion: content enters

Content arrives via sdk.memories.create() (runtime, as conversations happen) or sdk.memories.batch_create() (bulk imports and backfills, see Bootstrap Ingestion). At this point it is raw text with scope identifiers and an optional document_id.

await sdk.memories.create(
    document="The customer prefers email communication over phone calls.",
    document_type="ai-chat-conversation",
    user_id="user_alice",
    # customer_id="customer_acme",  # B2B only
    metadata={"source": "support_conversation"},
)

Processing: the multi-stage pipeline

Raw text becomes structured, queryable memory through the same categorization → extraction → chunking → entity resolution → organization pipeline described in How Ingestion Works. Extraction sorts content into the five memory types: facts, preferences, episodes, emotions, and temporal events. Entity resolution links mentions (“Alice,” “Alice Chen,” “A. Chen”) to a single canonical entity in the entity registry, creating the graph edges that power relationship queries.

Storage: dual-store persistence

Memories land in both engines, each scoped immutably by the identifiers present at ingestion.

Vector store

Memory chunks are embedded for semantic similarity search: finding relevant memories even without shared keywords.

Graph store

Entity relationships are stored for traversal: “what do we know about this customer’s team?” follows graph edges to connected memories.

Scope is set by which identity fields are present, and cannot change after storage:

Identifiers at ingestion	Resulting scope
`user_id` (+ `customer_id` on B2B)	USER
`customer_id` only	CUSTOMER
neither	CLIENT

Active retrieval: serving queries

When the agent needs context, the retrieval engine embeds the query, searches the vector store, traverses the graph, merges results across all applicable scopes (USER + CUSTOMER + CLIENT + WORLD), ranks them, and returns the top results within the token budget. Frequently surfaced memories stay prominent; rarely surfaced ones gradually deprioritize.

Aging and retention

Ranking weighs relevance, recency, and confidence, so older, less-relevant memories naturally give way to current information. No manual cleanup required. How aggressively memories age, and how long they are retained, is governed by your Memory Architecture Configuration; Synap derives sensible defaults from your use-case file.

Eviction: end of the lifecycle

When a memory is no longer retained it is archived (moved to cold storage, reachable only by explicit archive queries, good for compliance) or deleted (permanently removed from both stores, with entity connections cleaned up), depending on your configuration.

Retrieval, scope, and ranking

The retrieval engine searches the full scope chain and prefers the narrowest applicable scope:

USER scope     →  Alice's personal memories (highest priority)
CUSTOMER scope →  Acme Corp's shared knowledge
CLIENT scope   →  your application's product knowledge
WORLD scope    →  global domain knowledge (lowest priority)

context = await sdk.user.context.fetch(
    user_id="user_alice",
    # customer_id="customer_acme",  # B2B only
    search_query=["project timeline", "Q2 deliverables"],
    types=["facts", "temporal_events"],  # restrict to specific memory types
)

# context.facts → "Q2 roadmap includes API v3 launch and dashboard redesign"
# context.temporal_events → "API v3 launch deadline: June 15"

Two retrieval modes trade speed for depth:

fast: vector + graph search, tuned for low-latency interactive turns.
accurate: vector + graph plus LLM subquery decomposition and reranking, for deeper, higher-recall processing.

See Retrieval Modes for how to choose. Long-term memory is cumulative and self-managing: as it grows, entity resolution sharpens, retrieval gets richer, and ranking ensures only the most relevant memories surface regardless of total volume. Long-term context has two shared sub-layers based on scope (customer and organizational) covered next.

Customer context

Customer context is knowledge stored at the CUSTOMER scope: shared across all users within one B2B tenant, but invisible to other tenants. It is each customer’s internal wiki: policies, team structure, shared projects, and domain terminology.

Lifecycle: ingest with `customer_id`, no `user_id`

You create customer context by ingesting with a customer_id but no user_id. That single distinction is what places the memory at the customer scope.

await sdk.memories.create(
    document="""
    Acme Corp Engineering Handbook
    - All services must use Python 3.11 or later
    - Production deployments: Tuesdays and Thursdays, 10am-2pm PT
    - Hotfix deployments require VP Engineering approval
    """,
    document_type="document",
    customer_id="customer_acme",
    # No user_id, shared across all users at this customer
)

Do not accidentally include a user_id when ingesting customer-wide documents. With a user_id, the memory drops to the User scope and becomes visible to only that one person, defeating the purpose of shared tenant knowledge.

It then flows through the same processing and storage pipeline as any long-term memory, scoped to the customer. Retention and aging follow your configuration.

How you read it back

Retrieve customer context directly, or let it surface automatically inside user conversations:

Direct (customer scope and below)
Inside a user conversation

context = await sdk.customer.context.fetch(
    customer_id="customer_acme",
    search_query=["deployment process", "production releases"],
)
# Returns CUSTOMER + CLIENT + WORLD scopes (no USER scope):
# - "Production deployments: Tues/Thurs 10am-2pm PT" (CUSTOMER)
# - "Platform supports blue-green deployment strategy" (CLIENT)

context = await sdk.user.context.fetch(
    user_id="user_alice",
    customer_id="customer_acme",
    search_query=["when can I deploy to production"],
)
# Scope chain results, narrowest first:
# USER:     "Alice deployed the billing service last Tuesday"
# CUSTOMER: "Production deployments: Tues/Thurs 10am-2pm PT"
# CLIENT:   "Blue-green deployment support available"

The payoff is shared knowledge for every user in the tenant. If Alice ingests sprint planning notes at customer scope, Bob and Carol both see them on their next fetch, while each still has their own user-scoped memories.

Memory	Source scope	Alice	Bob	Carol
”Auth migration to OAuth 2.1 by Q2”	CUSTOMER	Yes	Yes	Yes
”Alice prefers Slack for notifications”	USER (Alice)	Yes	No	No
”Bob is on the Platform team”	USER (Bob)	No	Yes	No
”Product supports OAuth 2.0 and 2.1”	CLIENT	Yes	Yes	Yes

Organizational context

Organizational context is knowledge at the CLIENT scope: the broadest application-level scope. It is your product’s documentation, changelog, global policies, and domain knowledge, available to every user across every customer. Think of it as the product brain beneath all customer- and user-specific memories.

Lifecycle: ingest with no scope identifiers

Org context enters when you ingest without user_id or customer_id. For initial product-knowledge loads, batch_create is the recommended path: higher throughput, and processing a documentation set together improves cross-document entity resolution.

from maximem_synap import CreateMemoryRequest

# Single doc, no user_id or customer_id = CLIENT scope
await sdk.memories.create(
    document="Our standard SLA guarantees 99.9% uptime...",
    document_type="document",
    document_id="doc_sla_v2",
)

# Bulk load product documentation
documents = [
    CreateMemoryRequest(document=open("docs/api-reference.md").read(), document_type="document"),
    CreateMemoryRequest(document=open("docs/changelog-v3.md").read(), document_type="document"),
]
await sdk.memories.batch_create(documents=documents)

It goes through the same pipeline as user memories, but all entity resolution and storage happen at CLIENT scope, so when a user later mentions “Product X,” the system resolves it against the entity registered from your docs, connecting their question to the right documentation.

CLIENT-scope memories are accessible to all users of your application. Do not store sensitive internal documents (HR, financial, executive communications) as org context unless your app is for internal use only.

Updates, caching, and idempotency

Re-ingest a changed document with the same document_id to update it idempotently: the old version is replaced, reprocessed, and entity connections are refreshed. Use a stable naming convention like doc_<category>_<name>_v<version>, and for frequently changing sources (pricing, feature lists) schedule periodic re-ingestion from your source of truth. Because org knowledge is read-heavy and write-infrequent, client-scope retrieval results are cached with a 30-minute TTL. This lowers latency and reduces load on the stores; the trade-off is that after an update, changes may take up to the TTL window to propagate everywhere.

Aspect	Detail
Cache TTL	30 minutes
Why cache	Org context is read-heavy, write-infrequent
Invalidation	Re-ingesting (same `document_id`) refreshes affected entries within the next TTL window

How it surfaces in retrieval

Org context is merged into every user query at the lowest priority, beneath user and customer memories:

User Query: "What is the refund policy?"
         │
         ▼
  1. USER scope     → "Alice has a VIP 60-day return window"   ← highest priority
  2. CUSTOMER scope → "Acme Corp negotiated 45-day returns"    ← medium priority
  3. CLIENT scope   → "Standard refund policy: 30 days"        ← lowest priority
         │
         ▼
  Agent receives all three, ranked. A well-designed prompt prefers
  the most specific (user) answer over the general (org) baseline.

Org context is the knowledge baseline that narrower scopes can override. When budget is tight, narrower-scope memories are preserved first and org context is trimmed if necessary.

Context compaction

Compaction solves the short-term growth problem from the other side: when a conversation gets long, sending the full transcript to your LLM becomes expensive and eventually hits the context window. Compaction intelligently compresses the history (preserving key facts, decisions, preferences, and current state) instead of blindly truncating it.

Analyze

The engine reads the full transcript and identifies facts, decisions, preferences, emotional shifts, and where the discussion currently stands.

Extract

It pulls out five categories of essential information: facts, decisions, preferences, a summary narrative of the conversation arc, and the current state (active topic and open questions).

Compress

The extracted information is compressed into your target token budget. Recent turns are preserved verbatim for conversational flow; older, resolved turns become summaries. A validation_score is computed so you can confirm critical information survived.

Persist what lasts

Information with durable value is also routed through the ingestion pipeline into long-term memory, so knowledge from the conversation is not lost when the short-term context is compressed.

Compaction is lossy by design. For conversations where every nuance matters (legal, medical, financial), keep the full history and use compaction only for supplementary context.

Strategies

Strategy	Output size	Best for
`conservative`	Largest	Short conversations needing high detail; minimal information loss.
`balanced`	Medium	General-purpose; good compression-vs-detail balance.
`aggressive`	Smallest	Long or cost-sensitive conversations; keeps only the most critical facts.
`adaptive`	Varies	Synap analyzes the conversation (length, density, repetition, recency, budget) and picks the strategy. Recommended default.

The SDK surface

Compaction is asynchronous: compact kicks off a job and returns a handle, get_compaction_status polls for completion, and get_compacted returns the result.

import asyncio
import uuid

conversation_id = str(uuid.uuid4())  # reuse this conversation's UUID

# Kick off compaction (fire-and-forget, returns a trigger handle)
trigger = await sdk.conversation.context.compact(
    conversation_id=conversation_id,
    strategy="adaptive",
    target_tokens=2000,
)
print(f"Compaction {trigger.compaction_id} status: {trigger.status}")

# Poll until the run completes
while True:
    status = await sdk.conversation.context.get_compaction_status(
        conversation_id=conversation_id,
    )
    if status.status in ("completed", "failed"):
        break
    await asyncio.sleep(2)

# Read the compacted result
result = await sdk.conversation.context.get_compacted(conversation_id=conversation_id)
print(f"Compressed {result.original_token_count} -> {result.compacted_token_count} tokens")
print(f"Quality: {result.validation_score:.2f}, passed: {result.validation_passed}")
print(f"Facts: {result.facts}")
print(f"Current state: {result.current_state}")

Inspect validation_score (0.0-1.0) and validation_passed to confirm quality. If scores fall consistently low, switch to a less aggressive strategy or raise the token budget.

Compaction vs. retrieval

The two are complementary, not interchangeable:

Aspect	Compaction	Retrieval
Input	Current conversation history	Query against stored memories
Scope	One conversation	All memories across all conversations
Purpose	Reduce tokens for the current turn	Bring relevant past knowledge into the turn
Output	Compressed view of this conversation	Ranked memories from vector + graph stores

A typical production turn does both: retrieve relevant long-term memories, compact the current conversation if it is long, then combine retrieved memories + compacted summary + recent verbatim turns into the prompt.

Next steps

Memories & Context

The overview of how memories and context fit together in Synap.

Retrieval Modes

Choosing between fast (vector + graph) and accurate (vector + graph + LLM decomposition + reranking).

Entity Resolution

How mentions are resolved to canonical entities and linked in the graph.

Memory Scopes

The full scope chain and priority resolution rules.

​The lifecycle of a message

​The four context layers

​Short-term context

​Registering the conversation

​How the context grows

​Why it can’t grow forever

Token limits

Cost scaling

Quality degradation

​Long-term context

​Lifecycle: from raw content to durable memory

Vector store

Graph store

​Retrieval, scope, and ranking

​Customer context

​Lifecycle: ingest with customer_id, no user_id

​How you read it back

​Organizational context

​Lifecycle: ingest with no scope identifiers

​Updates, caching, and idempotency

​How it surfaces in retrieval

​Context compaction

​Strategies

​The SDK surface

​Compaction vs. retrieval

​Next steps

Memories & Context

Retrieval Modes

Entity Resolution

Memory Scopes

The lifecycle of a message

The four context layers

Short-term context

Registering the conversation

How the context grows

Why it can’t grow forever

Long-term context

Lifecycle: from raw content to durable memory

Retrieval, scope, and ranking

Customer context

Lifecycle: ingest with `customer_id`, no `user_id`

How you read it back

Organizational context

Lifecycle: ingest with no scope identifiers

Updates, caching, and idempotency

How it surfaces in retrieval

Context compaction

Strategies

The SDK surface

Compaction vs. retrieval

Next steps