Context Compaction - Maximem Synap

How compaction works

Compaction analyzes the full conversation, identifies what is essential, and produces a compressed representation that preserves the information your agent needs to maintain coherent, personalized responses.

Analyze the conversation

The compaction engine reads the full conversation history, identifying facts, decisions, preferences, emotional shifts, and the current state of the discussion.

Extract essential information

Key information is extracted and categorized: facts that have been established, decisions that have been made, preferences that have been expressed, and the current topic and emotional tone.

Compress into target format

The extracted information is compressed into the target token budget using the selected strategy. A quality validation score is computed to ensure critical information is preserved.

Return structured or injection-ready output

The compacted context is returned either as structured typed fields (for programmatic use) or as a single string ready for injection into an LLM system prompt.

Compaction strategies

Synap provides four compaction strategies, each optimizing for a different balance between compression and detail retention:

Strategy	Compression Ratio	Output Size	Best For
`conservative`	~70% of original	Largest	Short conversations where high detail is needed. Preserves most of the original context with minimal information loss.
`balanced`	~40% of original	Medium	General-purpose use. Good balance between compression and detail. Recommended for most applications.
`aggressive`	~15% of original	Smallest	Long conversations or cost-sensitive applications. Preserves only the most critical facts and decisions.
`adaptive`	Cloud decides	Varies	Synap analyzes the conversation and selects the optimal strategy automatically. Recommended default.

Compaction is lossy by design. Use the conservative strategy when you need maximum detail retention. For conversations where every nuance matters (legal, medical, financial), consider keeping full history and using compaction only for supplementary context.

How adaptive strategy works

The adaptive strategy analyzes several signals to choose the optimal compression level:

Conversation length: Longer conversations get more aggressive compression
Information density: Conversations with many facts and decisions get less aggressive compression to preserve detail
Repetition: Conversations with redundant exchanges get more aggressive compression
Recency: Recent turns are weighted more heavily than older ones
Token budget: The target token count influences how aggressively the engine compresses

What gets extracted

During compaction, the engine identifies and preserves five categories of essential information:

Facts

Factual statements established during the conversation. These are the foundation of compacted context — what has been confirmed, stated, or agreed upon.“User is based in Portland. They work at Acme Corp as a senior engineer. They started in January.”

Decisions

Decisions made during the conversation. These capture what was agreed, chosen, or determined.“User decided to proceed with Option B. Meeting scheduled for next Thursday.”

Preferences

Preferences expressed during the conversation, including communication style, topic interests, and behavioral patterns.“User prefers concise responses. They like bullet-point summaries. They dislike overly formal language.”

Summary Narrative

A natural language summary of the conversation arc — what happened, in what order, and where things stand now.“The user asked about migration options, discussed pricing, and chose the enterprise plan.”

Current State

The current topic, active questions, and unresolved threads. This ensures the agent knows where the conversation stands.“Currently discussing: implementation timeline. Open question: when can the team start?”

Quality validation

Every compaction result includes quality metrics so you can verify that the compression preserved critical information:

Field	Type	Description
`validation_score`	`float` (0.0-1.0)	Overall quality score. Higher means more information was preserved. Scores above 0.7 are generally considered good.
`validation_passed`	`bool`	Whether the compaction meets the minimum quality threshold. `false` if critical information was lost.
`original_token_count`	`int`	Token count of the original conversation.
`compacted_token_count`	`int`	Token count of the compacted output.
`compression_ratio`	`float`	Ratio of compacted to original (e.g., 0.35 means 35% of original size).

Monitor validation_score in your application. If scores consistently fall below 0.6, consider switching to a less aggressive strategy or increasing your token budget. Low validation scores indicate that important context may be lost.

Output formats

Compacted context can be returned in two formats:

Structured
Injection-ready

Returns typed fields that you can programmatically access. Best for applications that build custom LLM prompts.

# compact() triggers compaction asynchronously and returns immediately
trigger = await sdk.conversation.context.compact(
    conversation_id="conv_123",
    strategy="adaptive",
    target_tokens=2000
)
print(f"Compaction {trigger.compaction_id} status: {trigger.status}")

# Read the compacted content once it is ready
result = await sdk.conversation.context.get_compacted(
    conversation_id="conv_123"
)

# Access structured fields (see /sdk/response-shapes for the full CompactionResponse schema)
print(f"Facts: {result.facts}")
print(f"Decisions: {result.decisions}")
print(f"Preferences: {result.preferences}")
print(f"Current state: {result.current_state}")
print(f"Compacted text: {result.compacted_context}")
print(f"Quality: {result.validation_score}")

Returns a single string formatted for direct injection into an LLM system prompt. Best for applications that want a simple drop-in replacement for full conversation history.

compacted = await sdk.conversation.context.get_context_for_prompt(
    conversation_id="conv_123"
)

# Use directly in your LLM prompt
system_prompt = f"""You are a helpful assistant.

Here is the context from your previous conversation with this user:
{compacted.formatted_context}

Continue the conversation naturally, referencing the context above when relevant.
"""

Code examples

Basic compaction

from maximem_synap import MaximemSynapSDK

sdk = MaximemSynapSDK(api_key="your_api_key")

# Trigger compaction (fire-and-forget — returns a CompactionTriggerResponse)
trigger = await sdk.conversation.context.compact(
    conversation_id="conv_123",
    strategy="adaptive",
    target_tokens=2000
)
print(f"Compaction {trigger.compaction_id} status: {trigger.status}")

# Fetch the compacted result to read token counts and quality metrics
result = await sdk.conversation.context.get_compacted(
    conversation_id="conv_123"
)

print(f"Compressed {result.original_token_count} -> {result.compacted_token_count} tokens")
print(f"Compression ratio: {result.compression_ratio:.1%}")
print(f"Quality score: {result.validation_score:.2f}")
print(f"Validation passed: {result.validation_passed}")

Compaction with quality check

# Trigger compaction with a quality threshold
await sdk.conversation.context.compact(
    conversation_id="conv_123",
    strategy="balanced",
    target_tokens=3000
)

# Read the compacted result and inspect its quality
result = await sdk.conversation.context.get_compacted(
    conversation_id="conv_123"
)

if not result.validation_passed:
    # Quality too low — retry with less aggressive compression
    await sdk.conversation.context.compact(
        conversation_id="conv_123",
        strategy="conservative",
        target_tokens=4000
    )
    result = await sdk.conversation.context.get_compacted(
        conversation_id="conv_123"
    )

# Use the compacted context — `compacted_context` is the rendered string
context_for_llm = result.compacted_context

Using compacted context in a conversation flow

# Check if compaction is needed
conversation = await sdk.conversation.get("conv_123")

if conversation.token_count > 6000:
    # Conversation is getting long — compact it
    compacted = await sdk.conversation.context.get_context_for_prompt(
        conversation_id="conv_123"
    )

    # Build the prompt with compacted history + recent turns
    system_prompt = f"""You are a helpful assistant.

Previous conversation context (compacted):
{compacted.formatted_context}

Recent messages follow. Continue naturally.
"""
else:
    # Conversation is short enough — use full history
    system_prompt = "You are a helpful assistant."

When to compact

Use these guidelines to determine when to trigger compaction:

Token threshold: Compact when the conversation exceeds 60-70% of your LLM’s context window. This leaves room for the system prompt, retrieved memories, and the model’s response.
Turn count: For most applications, compact after 15-20 turns. Long conversations rarely need every early turn.
Cost threshold: If you are cost-sensitive, compact whenever the estimated token cost exceeds your per-conversation budget.
Periodic: For very long-running sessions (e.g., all-day copilots), compact on a regular interval (every 10-15 minutes of active conversation).
Adaptive: Use the adaptive strategy and let Synap decide. This is the recommended approach for most applications.

Compaction vs. retrieval

Compaction and retrieval serve different purposes and are complementary:

Aspect	Context Compaction	Memory Retrieval
Input	Current conversation history	Query against stored memories
Scope	Single conversation	All memories across all conversations
Purpose	Reduce token usage for current conversation	Bring relevant past knowledge into current conversation
Output	Compressed version of current conversation	Ranked memories from vector and graph stores
When to use	Conversation is too long for LLM context	Agent needs knowledge from past interactions

A typical production flow uses both:

Retrieve relevant memories from past conversations
Compact the current conversation if it is long
Combine retrieved memories + compacted context + recent turns into the LLM prompt

# 1. Retrieve past memories
past_context = await sdk.user.context.fetch(
    user_id="user_123",
    customer_id="acme_corp",
    search_query=["recent context"],
)

# 2. Get the prompt-ready compacted view of the current conversation
compacted = await sdk.conversation.context.get_context_for_prompt(
    conversation_id="conv_456"
)

# 3. Render past memories into a readable block
def render_memories(ctx) -> str:
    lines = []
    for p in ctx.preferences:
        lines.append(f"- Preference: {p.content} (strength {p.strength:.2f})")
    for f in ctx.facts:
        lines.append(f"- Fact: {f.content}")
    return "\n".join(lines)

# 4. Build the full prompt
system_prompt = f"""You are a helpful assistant.

What you know about this user from past conversations:
{render_memories(past_context)}

Summary of the current conversation so far:
{compacted.formatted_context}

Continue the conversation naturally.
"""

Next steps

Memories & Context

See how compaction fits into the broader memory lifecycle.

Memory Architecture

Configure token budgets and retrieval settings in MACA.

SDK: Context Compaction

Full SDK reference for compaction methods and parameters.

Storage Infrastructure

Understand the storage engines that power memory retrieval.

Documentation Index

​How compaction works

​Compaction strategies

​How adaptive strategy works

​What gets extracted

Facts

Decisions

Preferences

Summary Narrative

Current State

​Quality validation

​Output formats

​Code examples

​Basic compaction

​Compaction with quality check

​Using compacted context in a conversation flow

​When to compact

​Compaction vs. retrieval

​Next steps

Memories & Context

Memory Architecture

SDK: Context Compaction

Storage Infrastructure

How compaction works

Compaction strategies

How adaptive strategy works

What gets extracted

Quality validation

Output formats

Code examples

Basic compaction

Compaction with quality check

Using compacted context in a conversation flow

When to compact

Compaction vs. retrieval

Next steps