Context Compaction - Maximem Synap

Overview

As conversations grow, the amount of context you need to inject into your LLM’s prompt grows with them. Context compaction solves this by intelligently compressing conversation history while preserving the most important information: facts, decisions, preferences, and key narrative elements. Compaction is particularly valuable when:

Conversations exceed your LLM’s context window
You want to reduce token costs without losing critical context
You need to summarize long conversation histories into a concise prompt segment

Synap provides two ways to work with compacted context:

Approach	Method	Best for
Get context for prompt	`get_context_for_prompt()`	Most integrations: one call, prompt-ready context
Manual compaction control	`compact()` + `get_compacted()` + `get_compaction_status()`	Fine-grained control over strategy, token budgets, and timing

Prerequisites: Recording Messages

Before calling any compaction method, the conversation must have messages recorded via sdk.conversation.record_message(). Calling compact() or get_context_for_prompt() on a conversation with no recorded messages raises a SynapError because no messages have been recorded yet.

import uuid

conversation_id = str(uuid.uuid4())

# Record messages before any compaction call.
# Each message must carry the user_id and customer_id that scope the conversation.
await sdk.conversation.record_message(
    conversation_id=conversation_id,
    role="user",
    content="I'd like to plan a trip to Japan in April.",
    user_id="user_alice",
    customer_id="acme_corp",
)
await sdk.conversation.record_message(
    conversation_id=conversation_id,
    role="assistant",
    content="Great choice! April is cherry blossom season. Would you prefer Tokyo or Kyoto?",
    user_id="user_alice",
    customer_id="acme_corp",
)

conversation_id must be a valid UUID string. Non-UUID strings (e.g. "conv_abc123") are rejected by the server. Use str(uuid.uuid4()) to generate one, or pass a UUID you already manage in your system.

Quick Start: Get Context for Your Prompt

For most use cases, get_context_for_prompt() is all you need. It returns the best available compacted context in a single call, pre-formatted for injection into your LLM’s system prompt.

result = await sdk.conversation.context.get_context_for_prompt(
    conversation_id="3f6b1a2c-4d5e-6f7a-8b9c-0d1e2f3a4b5c"
)

if result.available:
    # Inject into your LLM system prompt
    system_prompt = f"""You are a helpful assistant.

## Conversation History
{result.formatted_context}"""
else:
    # No compacted context yet: use context.fetch() or trigger compact()
    system_prompt = "You are a helpful assistant."

The SDK caches the result locally (5-minute TTL), so calling this method on every LLM turn is safe and fast. The response exposes the prompt-ready formatted_context string plus metadata you act on: available (whether any compacted context exists), is_stale (newer messages recorded since the last compaction), and quality_warning (validation score below threshold). validation_score, compression_ratio, and compaction_age_seconds are also returned for monitoring and custom staleness thresholds.

Full parameter reference →

Every field on ContextForPromptResponse, with types and defaults.

Formatting Styles

The style parameter is accepted by the SDK ("structured", "narrative", "bullet_points") and will control output formatting in a future backend release. At this stage, all three values return the same raw message content: there is no formatting difference between them yet.

result = await sdk.conversation.context.get_context_for_prompt(
    conversation_id="3f6b1a2c-4d5e-6f7a-8b9c-0d1e2f3a4b5c",
    style="structured"  # "narrative" and "bullet_points" return identical output for now
)

Style-based formatting differentiation is not yet active on the backend. You can safely pass any supported style value: it will take effect automatically once the backend support ships, with no code changes required on your side.

Handling Missing Context

When available is False, no compacted context exists yet. This happens for new conversations or conversations that haven’t been compacted. You have two options:

result = await sdk.conversation.context.get_context_for_prompt(
    conversation_id="3f6b1a2c-4d5e-6f7a-8b9c-0d1e2f3a4b5c"
)

if not result.available:
    # Option 1: Fall back to retrieval-based context
    context = await sdk.conversation.context.fetch(
        conversation_id="3f6b1a2c-4d5e-6f7a-8b9c-0d1e2f3a4b5c",
        search_query=[user_message],
        mode="fast"
    )

    # Option 2: Trigger compaction, then retry
    await sdk.conversation.context.compact(
        conversation_id="3f6b1a2c-4d5e-6f7a-8b9c-0d1e2f3a4b5c",
        strategy="adaptive"
    )

Manual Compaction Control

Use these methods when you need fine-grained control over compaction: choosing a strategy, setting token budgets, polling for completion, or retrieving specific versions.

Triggering Compaction

Use sdk.conversation.context.compact() to explicitly trigger compaction with specific parameters.

compaction = await sdk.conversation.context.compact(
    conversation_id="3f6b1a2c-4d5e-6f7a-8b9c-0d1e2f3a4b5c",
    strategy="adaptive",
    target_tokens=2000,
    force=False
)

print(f"Compaction ID: {compaction.compaction_id}")
print(f"Status: {compaction.status}")

Key parameters. strategy controls how aggressively the context is compressed (see Strategies below; defaults to "adaptive"). target_tokens sets a desired output size and takes priority over the strategy’s default compression level. force=True skips staleness checks so the conversation re-compacts even if no new memories have been ingested since the last run.

Full parameter reference →

Every compact() parameter, including the compaction_level alias, with types and defaults.

Compaction Strategies

The CompactionLevel enum exposes seven values: low, medium, high, conservative, balanced, aggressive, and adaptive. The four customer-facing strategies below cover the recommended use cases.

adaptive
aggressive
balanced
conservative

Recommended. Dynamically adjusts compression based on the content. Dense, fact-heavy conversations are compressed less aggressively; repetitive or low-information conversations are compressed more aggressively.

Typical compression: 30-60% of original tokens
Preserves: all high-confidence facts, decisions, preferences, key narrative flow
Drops: repetitive information, low-value conversational filler, redundant context

compaction = await sdk.conversation.context.compact(
    conversation_id="3f6b1a2c-4d5e-6f7a-8b9c-0d1e2f3a4b5c",
    strategy="adaptive"
)

Maximum compression. Retains only the most critical facts and decisions. Best for very long conversations where you need to fit context into a tight token budget.

Typical compression: ~15% of original tokens
Preserves: highest-confidence facts, explicit decisions, critical preferences
Drops: narrative context, lower-confidence facts, episode details, emotional context

compaction = await sdk.conversation.context.compact(
    conversation_id="3f6b1a2c-4d5e-6f7a-8b9c-0d1e2f3a4b5c",
    strategy="aggressive",
    target_tokens=500
)

Moderate compression that preserves a broader range of context while still achieving meaningful reduction.

Typical compression: ~40% of original tokens
Preserves: facts, preferences, decisions, episode summaries
Drops: redundant information, low-confidence extractions, verbose narrative

compaction = await sdk.conversation.context.compact(
    conversation_id="3f6b1a2c-4d5e-6f7a-8b9c-0d1e2f3a4b5c",
    strategy="balanced"
)

Minimal compression. Preserves nearly all extracted information with light deduplication and reformatting.

Typical compression: ~70% of original tokens
Preserves: nearly everything, including facts, preferences, episodes, emotions, narrative
Drops: exact duplicates, obvious filler

compaction = await sdk.conversation.context.compact(
    conversation_id="3f6b1a2c-4d5e-6f7a-8b9c-0d1e2f3a4b5c",
    strategy="conservative"
)

Strategy Selection Guide

Scenario	Recommended Strategy
General-purpose, unsure what to use	`adaptive`
Very long conversations (100+ turns)	`aggressive` with `target_tokens`
Important conversations, high-value context	`conservative`
Moderate conversations, cost optimization	`balanced`
Dynamic workload with varying conversation lengths	`adaptive`

CompactionTriggerResponse

The compact() method kicks off a compaction job asynchronously. It returns a CompactionTriggerResponse confirming the job was accepted, not the compacted content itself. The handle carries a compaction_id (use it with get_compaction_status() to poll progress) and an initial status. To retrieve the compacted output, call get_compacted() once the job completes (see Retrieving Compacted Context below) or poll status via get_compaction_status().

Full parameter reference →

Every field on CompactionTriggerResponse, including trigger_type, initiated_at, estimated_completion_seconds, and the previous_context returned while a new run completes.

Retrieving Compacted Context

Use sdk.conversation.context.get_compacted() to retrieve a previously compacted version of a conversation’s context without triggering a new compaction. This returns the rich CompactionResponse model with the actual compacted text and typed extractions.

compacted = await sdk.conversation.context.get_compacted(
    conversation_id="3f6b1a2c-4d5e-6f7a-8b9c-0d1e2f3a4b5c",
    version=None,            # None = latest version
    format="structured",
)

if compacted:
    print(f"Compacted context: {compacted.compacted_context[:200]}...")
    print(f"Compression ratio: {compacted.compression_ratio:.0%}")
    for fact in compacted.facts:
        print(f"- {fact}")
else:
    print("No compaction exists for this conversation yet.")

Key parameters. Pass version=None (the default) for the latest compaction, or a specific version number to fetch that run from the cloud (skipping the local cache). format controls the shape of the result: "structured" returns typed facts, decisions, preferences, and current_state lists; "narrative" returns a prose summary in compacted_context; "injection" returns a pre-formatted string for direct prompt injection.

get_compacted() returns a CompactionResponse with the compacted text (compacted_context), token counts and compression_ratio, the typed facts / decisions / preferences extractions, and quality signals (validation_score, validation_passed, quality_warning).

Pay attention to the quality_warning field. When present, it indicates that the compaction may have lost important information. Consider using a less aggressive strategy or increasing target_tokens if quality warnings appear consistently.

Full parameter reference →

Every get_compacted() parameter and CompactionResponse field, with types and defaults.

Checking Compaction Status

Use sdk.conversation.context.get_compaction_status() to check the current compaction state of a conversation without retrieving the full compacted content.

status = await sdk.conversation.context.get_compaction_status(
    conversation_id="3f6b1a2c-4d5e-6f7a-8b9c-0d1e2f3a4b5c"
)

print(f"Status: {status.status}")              # e.g. "completed", "in_progress", "none"
print(f"Compaction ID: {status.compaction_id}")
print(f"Latest version: {status.latest_version}")
print(f"Compression ratio: {status.compression_ratio}")
print(f"Validation score: {status.validation_score}")
print(f"Completed at: {status.completed_at}")
print(f"Latest created at: {status.latest_created_at}")

The status response is a CompactionStatusResponse Pydantic model with the following fields:

Field	Type	Description
`conversation_id`	`str`	The conversation this status refers to
`status`	`str`	One of `"completed"`, `"in_progress"`, `"failed"`, or `"none"` (no compaction yet)
`compaction_id`	`Optional[str]`	ID of the current compaction job, if one exists
`completed_at`	`Optional[datetime]`	When the last successful compaction finished
`compression_ratio`	`Optional[float]`	Compression ratio of the current compaction
`validation_score`	`Optional[float]`	Quality score of the current compaction
`estimated_completion_seconds`	`Optional[int]`	If `status == "in_progress"`, approximate seconds remaining
`error_message`	`Optional[str]`	Populated when `status == "failed"`
`latest_version`	`Optional[int]`	Highest version number on record
`latest_created_at`	`Optional[datetime]`	When `latest_version` was created

To check whether any compaction exists, use status.status != "none". To check whether the compaction is current with the latest messages, fetch get_context_for_prompt and inspect its is_stale field: staleness is a property of the prompt-ready view, not the compaction job itself.

Full parameter reference →

Every field on CompactionStatusResponse, with types and defaults.

Full Examples

Simple: Get Context for Prompt

The recommended approach for most integrations. One call per LLM turn.

async def chat_with_memory(conversation_id: str, user_message: str):
    """Chat using compacted memory context."""

    # Get compacted context (cached locally, safe to call every turn)
    compacted = await sdk.conversation.context.get_context_for_prompt(
        conversation_id=conversation_id,
        style="structured"
    )

    # Also fetch query-specific context for this message
    recent = await sdk.conversation.context.fetch(
        conversation_id=conversation_id,
        search_query=[user_message],
        max_results=5,
        mode="fast"
    )

    recent_facts = "\n".join(
        f"- {fact.content}" for fact in recent.facts if fact.confidence >= 0.8
    )

    # Build system prompt
    history_section = ""
    if compacted.available:
        history_section = f"""
## Conversation History (Compacted)
{compacted.formatted_context}
"""
        if compacted.is_stale:
            # Optionally trigger re-compaction in the background
            await sdk.conversation.context.compact(
                conversation_id=conversation_id,
                strategy="adaptive"
            )

    system_prompt = f"""You are a helpful assistant with memory of past conversations.
{history_section}
## Recently Relevant Facts
{recent_facts if recent_facts else "None specifically relevant to this query."}

Use this context to personalize your responses."""

    # Pass to your LLM
    # response = await llm.generate(system_prompt, user_message)
    return system_prompt

Advanced: Manual Compaction Control

Use this approach when you need explicit control over compaction strategy and timing.

async def get_optimized_context(conversation_id: str, token_budget: int = 2000) -> str:
    """Get compacted context for a conversation, re-compacting if needed."""

    # Check current compaction status (CompactionStatusResponse: use dot access)
    status = await sdk.conversation.context.get_compaction_status(
        conversation_id=conversation_id
    )

    if status.status == "completed":
        # Existing compaction is available: retrieve it
        compacted = await sdk.conversation.context.get_compacted(
            conversation_id=conversation_id,
            format="structured"
        )
        if compacted:
            return compacted.compacted_context

    # No completed compaction yet: trigger one
    trigger = await sdk.conversation.context.compact(
        conversation_id=conversation_id,
        strategy="adaptive",
        target_tokens=token_budget
    )

    print(f"Compaction {trigger.compaction_id} triggered, "
          f"status={trigger.status}, "
          f"~{trigger.estimated_completion_seconds}s to complete")

    # Poll until ready, then fetch the result
    while True:
        status = await sdk.conversation.context.get_compaction_status(
            conversation_id=conversation_id
        )
        if status.status in ("completed", "failed"):
            break
        await asyncio.sleep(1)

    if status.status == "failed":
        raise RuntimeError(f"Compaction failed: {status.error_message}")

    compacted = await sdk.conversation.context.get_compacted(
        conversation_id=conversation_id,
        format="structured"
    )

    if compacted.quality_warning:
        print(f"Compaction quality warning set")

    print(f"Compacted {compacted.original_token_count} tokens → "
          f"{compacted.compacted_token_count} tokens "
          f"({compacted.compression_ratio:.0%} ratio, "
          f"validation: {compacted.validation_score})")

    return compacted.compacted_context


async def chat_with_compacted_memory(conversation_id: str, user_message: str):
    """Chat using compacted memory context."""

    # Get optimized context within token budget
    memory_context = await get_optimized_context(
        conversation_id=conversation_id,
        token_budget=2000
    )

    # Also fetch recent high-relevance context for this specific query
    recent = await sdk.conversation.context.fetch(
        conversation_id=conversation_id,
        search_query=[user_message],
        max_results=5,
        mode="fast"
    )

    # Build system prompt with both compacted history and recent context
    recent_facts = "\n".join(
        f"- {fact.content}" for fact in recent.facts if fact.confidence >= 0.8
    )

    system_prompt = f"""You are a helpful assistant with memory of past conversations.

## Conversation History (Compacted)
{memory_context}

## Recently Relevant Facts
{recent_facts if recent_facts else "None specifically relevant to this query."}

Use this context to personalize your responses."""

    # Pass to your LLM
    # response = await llm.generate(system_prompt, user_message)
    return system_prompt

Best Practices

Start with get_context_for_prompt()

For most integrations, get_context_for_prompt() is the right choice. It returns prompt-ready context in a single call, handles local caching automatically, and includes staleness and quality metadata. Only reach for the manual methods (compact(), get_compacted(), get_compaction_status()) when you need explicit control over strategy or token budgets.

Combine compacted context with live retrieval

Use compacted context for broad historical context and conversation.context.fetch() for query-specific recent context. This gives your LLM both a comprehensive history and targeted relevant details. Both examples above demonstrate this pattern.

Use adaptive strategy as default

The adaptive strategy automatically selects the right compression level based on content density. It is the safest default for most applications and handles a wide range of conversation lengths and content types.

Set target_tokens based on your LLM's context window

If your LLM has a 128k token context window and your system prompt uses ~2k tokens plus the user message, you might allocate 4-8k tokens for compacted memory context. Use target_tokens to enforce this budget.

Check is_stale before re-compacting

Use get_compaction_status() to avoid unnecessary re-compaction. Only compact when is_stale is True, meaning new memories have been added since the last compaction. This saves processing time and API calls. If you are using get_context_for_prompt(), the is_stale field on the response serves the same purpose.

Monitor validation_score

Track validation_score over time. Consistently low scores (below 0.7) may indicate that your conversations contain highly diverse topics that do not compress well. Consider switching to conservative strategy or increasing target_tokens.

Handle quality_warning gracefully

When quality_warning is present, log it and consider falling back to a less aggressive strategy. You can implement an automatic fallback pattern:

compaction = await sdk.conversation.context.compact(
    conversation_id=conv_id,
    strategy="balanced"
)

if compaction.quality_warning:
    # Retry with less compression
    compaction = await sdk.conversation.context.compact(
        conversation_id=conv_id,
        strategy="conservative",
        force=True
    )

Next Steps

Retrieving Memories

Retrieve context to combine with compacted history.

Ingesting Memories

Ingest new data that triggers compaction staleness.

Context Compaction Concepts

Deep dive into compaction algorithms and architecture.

SDK Configuration

Configure timeouts and retries for compaction operations.

​Overview

​Prerequisites: Recording Messages

​Quick Start: Get Context for Your Prompt

Full parameter reference →

​Formatting Styles

​Handling Missing Context

​Manual Compaction Control

​Triggering Compaction

Full parameter reference →

​Compaction Strategies

​Strategy Selection Guide

​CompactionTriggerResponse

Full parameter reference →

​Retrieving Compacted Context

Full parameter reference →

​Checking Compaction Status

Full parameter reference →

​Full Examples

​Simple: Get Context for Prompt

​Advanced: Manual Compaction Control

​Best Practices

​Next Steps

Retrieving Memories

Ingesting Memories

Context Compaction Concepts

SDK Configuration

Overview

Prerequisites: Recording Messages

Quick Start: Get Context for Your Prompt

Formatting Styles

Handling Missing Context

Manual Compaction Control

Triggering Compaction

Compaction Strategies

Strategy Selection Guide

CompactionTriggerResponse

Retrieving Compacted Context

Checking Compaction Status

Full Examples

Simple: Get Context for Prompt

Advanced: Manual Compaction Control

Best Practices

Next Steps