Skip to main content

How compaction works

Compaction analyzes the full conversation, identifies what is essential, and produces a compressed representation that preserves the information your agent needs to maintain coherent, personalized responses.
1

Analyze the conversation

The compaction engine reads the full conversation history, identifying facts, decisions, preferences, emotional shifts, and the current state of the discussion.
2

Extract essential information

Key information is extracted and categorized: facts that have been established, decisions that have been made, preferences that have been expressed, and the current topic and emotional tone.
3

Compress into target format

The extracted information is compressed into the target token budget using the selected strategy. A quality validation score is computed to ensure critical information is preserved.
4

Return structured or injection-ready output

The compacted context is returned either as structured typed fields (for programmatic use) or as a single string ready for injection into an LLM system prompt.

Compaction strategies

Synap provides four compaction strategies, each optimizing for a different balance between compression and detail retention:
StrategyCompression RatioOutput SizeBest For
conservative~70% of originalLargestShort conversations where high detail is needed. Preserves most of the original context with minimal information loss.
balanced~40% of originalMediumGeneral-purpose use. Good balance between compression and detail. Recommended for most applications.
aggressive~15% of originalSmallestLong conversations or cost-sensitive applications. Preserves only the most critical facts and decisions.
adaptiveCloud decidesVariesSynap analyzes the conversation and selects the optimal strategy automatically. Recommended default.
Compaction is lossy by design. Use the conservative strategy when you need maximum detail retention. For conversations where every nuance matters (legal, medical, financial), consider keeping full history and using compaction only for supplementary context.

How adaptive strategy works

The adaptive strategy analyzes several signals to choose the optimal compression level:
  • Conversation length: Longer conversations get more aggressive compression
  • Information density: Conversations with many facts and decisions get less aggressive compression to preserve detail
  • Repetition: Conversations with redundant exchanges get more aggressive compression
  • Recency: Recent turns are weighted more heavily than older ones
  • Token budget: The target token count influences how aggressively the engine compresses

What gets extracted

During compaction, the engine identifies and preserves five categories of essential information:

Facts

Factual statements established during the conversation. These are the foundation of compacted context — what has been confirmed, stated, or agreed upon.“User is based in Portland. They work at Acme Corp as a senior engineer. They started in January.”

Decisions

Decisions made during the conversation. These capture what was agreed, chosen, or determined.“User decided to proceed with Option B. Meeting scheduled for next Thursday.”

Preferences

Preferences expressed during the conversation, including communication style, topic interests, and behavioral patterns.“User prefers concise responses. They like bullet-point summaries. They dislike overly formal language.”

Summary Narrative

A natural language summary of the conversation arc — what happened, in what order, and where things stand now.“The user asked about migration options, discussed pricing, and chose the enterprise plan.”

Current State

The current topic, active questions, and unresolved threads. This ensures the agent knows where the conversation stands.“Currently discussing: implementation timeline. Open question: when can the team start?”

Quality validation

Every compaction result includes quality metrics so you can verify that the compression preserved critical information:
FieldTypeDescription
validation_scorefloat (0.0-1.0)Overall quality score. Higher means more information was preserved. Scores above 0.7 are generally considered good.
validation_passedboolWhether the compaction meets the minimum quality threshold. false if critical information was lost.
original_token_countintToken count of the original conversation.
compacted_token_countintToken count of the compacted output.
compression_ratiofloatRatio of compacted to original (e.g., 0.35 means 35% of original size).
preserved_facts_countintNumber of facts preserved in the compacted output.
preserved_decisions_countintNumber of decisions preserved.
Monitor validation_score in your application. If scores consistently fall below 0.6, consider switching to a less aggressive strategy or increasing your token budget. Low validation scores indicate that important context may be lost.

Output formats

Compacted context can be returned in two formats:
Returns typed fields that you can programmatically access. Best for applications that build custom LLM prompts.
result = await sdk.conversation.context.compact(
    conversation_id="conv_123",
    strategy="adaptive",
    target_tokens=2000
)

# Access structured fields
print(f"Facts: {result.facts}")
print(f"Decisions: {result.decisions}")
print(f"Preferences: {result.preferences}")
print(f"Summary: {result.summary}")
print(f"Current state: {result.current_state}")
print(f"Quality: {result.validation_score}")

Code examples

Basic compaction

from synap import Synap

sdk = Synap(api_key="your_api_key")

# Compact a conversation
result = await sdk.conversation.context.compact(
    conversation_id="conv_123",
    strategy="adaptive",
    target_tokens=2000
)

print(f"Compressed {result.original_token_count} -> {result.compacted_token_count} tokens")
print(f"Compression ratio: {result.compression_ratio:.1%}")
print(f"Quality score: {result.validation_score:.2f}")
print(f"Validation passed: {result.validation_passed}")

Compaction with quality check

# Compact with a quality threshold
result = await sdk.conversation.context.compact(
    conversation_id="conv_123",
    strategy="balanced",
    target_tokens=3000
)

if not result.validation_passed:
    # Quality too low — retry with less aggressive compression
    result = await sdk.conversation.context.compact(
        conversation_id="conv_123",
        strategy="conservative",
        target_tokens=4000
    )

# Use the compacted context
context_for_llm = result.to_prompt_string()

Using compacted context in a conversation flow

# Check if compaction is needed
conversation = await sdk.conversation.get("conv_123")

if conversation.token_count > 6000:
    # Conversation is getting long — compact it
    compacted = await sdk.conversation.context.get_compacted(
        conversation_id="conv_123",
        format="injection-ready"
    )

    # Build the prompt with compacted history + recent turns
    system_prompt = f"""You are a helpful assistant.

Previous conversation context (compacted):
{compacted.text}

Recent messages follow. Continue naturally.
"""
else:
    # Conversation is short enough — use full history
    system_prompt = "You are a helpful assistant."

When to compact

Use these guidelines to determine when to trigger compaction:
  • Token threshold: Compact when the conversation exceeds 60-70% of your LLM’s context window. This leaves room for the system prompt, retrieved memories, and the model’s response.
  • Turn count: For most applications, compact after 15-20 turns. Long conversations rarely need every early turn.
  • Cost threshold: If you are cost-sensitive, compact whenever the estimated token cost exceeds your per-conversation budget.
  • Periodic: For very long-running sessions (e.g., all-day copilots), compact on a regular interval (every 10-15 minutes of active conversation).
  • Adaptive: Use the adaptive strategy and let Synap decide. This is the recommended approach for most applications.

Compaction vs. retrieval

Compaction and retrieval serve different purposes and are complementary:
AspectContext CompactionMemory Retrieval
InputCurrent conversation historyQuery against stored memories
ScopeSingle conversationAll memories across all conversations
PurposeReduce token usage for current conversationBring relevant past knowledge into current conversation
OutputCompressed version of current conversationRanked memories from vector and graph stores
When to useConversation is too long for LLM contextAgent needs knowledge from past interactions
A typical production flow uses both:
  1. Retrieve relevant memories from past conversations
  2. Compact the current conversation if it is long
  3. Combine retrieved memories + compacted context + recent turns into the LLM prompt
# 1. Retrieve past memories
past_context = await sdk.user.context.fetch(
    user_id="user_123",
    customer_id="acme_corp"
)

# 2. Compact current conversation if needed
compacted = await sdk.conversation.context.get_compacted(
    conversation_id="conv_456",
    format="injection-ready"
)

# 3. Build the full prompt
system_prompt = f"""You are a helpful assistant.

What you know about this user from past conversations:
{past_context.to_prompt_string()}

Summary of the current conversation so far:
{compacted.text}

Continue the conversation naturally.
"""

Next steps

Memories & Context

See how compaction fits into the broader memory lifecycle.

Memory Architecture

Configure token budgets and retrieval settings in MACA.

SDK: Context Compaction

Full SDK reference for compaction methods and parameters.

Storage Infrastructure

Understand the storage engines that power memory retrieval.