Skip to main content

Overview

As conversations grow, the amount of context you need to inject into your LLM’s prompt grows with them. Context compaction solves this by intelligently compressing conversation history while preserving the most important information — facts, decisions, preferences, and key narrative elements. Compaction is particularly valuable when:
  • Conversations exceed your LLM’s context window
  • You want to reduce token costs without losing critical context
  • You need to summarize long conversation histories into a concise prompt segment
Synap provides two ways to work with compacted context:
ApproachMethodBest for
Get context for promptget_context_for_prompt()Most integrations — one call, prompt-ready context
Manual compaction controlcompact() + get_compacted() + get_compaction_status()Fine-grained control over strategy, token budgets, and timing

Prerequisites: Recording Messages

Before calling any compaction method, the conversation must have messages recorded via sdk.conversation.record_message(). Calling compact() or get_context_for_prompt() on a conversation with no recorded messages returns HTTP 403.
import uuid

conversation_id = str(uuid.uuid4())

# Record messages before any compaction call
await sdk.conversation.record_message(
    conversation_id=conversation_id,
    role="user",
    content="I'd like to plan a trip to Japan in April."
)
await sdk.conversation.record_message(
    conversation_id=conversation_id,
    role="assistant",
    content="Great choice! April is cherry blossom season. Would you prefer Tokyo or Kyoto?"
)
conversation_id must be a valid UUID string. Non-UUID strings (e.g. "conv_abc123") will cause a ServiceUnavailableError on the backend. Use str(uuid.uuid4()) to generate one, or pass a UUID you already manage in your system.

Quick Start: Get Context for Your Prompt

For most use cases, get_context_for_prompt() is all you need. It returns the best available compacted context in a single call, pre-formatted for injection into your LLM’s system prompt.
result = await sdk.conversation.context.get_context_for_prompt(
    conversation_id="3f6b1a2c-4d5e-6f7a-8b9c-0d1e2f3a4b5c"
)

if result.available:
    # Inject into your LLM system prompt
    system_prompt = f"""You are a helpful assistant.

## Conversation History
{result.formatted_context}"""
else:
    # No compacted context yet -- use context.fetch() or trigger compact()
    system_prompt = "You are a helpful assistant."
The SDK caches the result locally (5-minute TTL), so calling this method on every LLM turn is safe and fast.

ContextForPromptResponse

formatted_context
str
Pre-formatted compacted context, ready for direct injection into an LLM system prompt. None when no compacted context is available.
available
bool
required
Whether compacted context exists for this conversation. When False, use context.fetch() for retrieval-based context or trigger compact() to create a compaction.
is_stale
bool
required
True when the compacted context is older than 15 minutes. Stale context is still usable but may not reflect the latest conversation turns. Consider triggering a fresh compact() if staleness matters for your use case.
compression_ratio
float
Ratio of compacted to original tokens (e.g., 0.35 means 35% of original size).
validation_score
float
Quality score between 0.0 and 1.0 indicating how well the compacted context preserves the original information.
compaction_age_seconds
int
Seconds since the compaction was created. Use this to implement custom staleness thresholds.
quality_warning
bool
required
True when the validation score is below expected thresholds. When set, the compacted context may have lost important information.

Formatting Styles

The style parameter is accepted by the SDK ("structured", "narrative", "bullet_points") and will control output formatting in a future backend release. At this stage, all three values return the same raw message content — there is no formatting difference between them yet.
result = await sdk.conversation.context.get_context_for_prompt(
    conversation_id="3f6b1a2c-4d5e-6f7a-8b9c-0d1e2f3a4b5c",
    style="structured"  # "narrative" and "bullet_points" return identical output for now
)
Style-based formatting differentiation is not yet active on the backend. You can safely pass any supported style value — it will take effect automatically once the backend support ships, with no code changes required on your side.

Handling Missing Context

When available is False, no compacted context exists yet. This happens for new conversations or conversations that haven’t been compacted. You have two options:
result = await sdk.conversation.context.get_context_for_prompt(
    conversation_id="3f6b1a2c-4d5e-6f7a-8b9c-0d1e2f3a4b5c"
)

if not result.available:
    # Option 1: Fall back to retrieval-based context
    context = await sdk.conversation.context.fetch(
        conversation_id="3f6b1a2c-4d5e-6f7a-8b9c-0d1e2f3a4b5c",
        search_query=[user_message],
        mode="fast"
    )

    # Option 2: Trigger compaction, then retry
    await sdk.conversation.context.compact(
        conversation_id="3f6b1a2c-4d5e-6f7a-8b9c-0d1e2f3a4b5c",
        strategy="adaptive"
    )

Manual Compaction Control

Use these methods when you need fine-grained control over compaction — choosing a strategy, setting token budgets, polling for completion, or retrieving specific versions.

Triggering Compaction

Use sdk.conversation.context.compact() to explicitly trigger compaction with specific parameters.
compaction = await sdk.conversation.context.compact(
    conversation_id="3f6b1a2c-4d5e-6f7a-8b9c-0d1e2f3a4b5c",
    strategy="adaptive",
    target_tokens=2000,
    force=False
)

print(f"Compaction ID: {compaction.compaction_id}")
print(f"Status: {compaction.status}")

Parameter Reference

conversation_id
str
required
The identifier of the conversation to compact.
strategy
str
The compaction strategy to apply. Controls how aggressively the context is compressed. See Strategies below. If omitted, "adaptive" is used.
target_tokens
int
The desired output size in tokens. When specified, the compactor adjusts its aggressiveness to hit this target as closely as possible. Takes priority over the strategy’s default compression level.
force
bool
default:"False"
Skip staleness checks. By default, Synap only re-compacts a conversation if new memories have been ingested since the last compaction. Set force=True to compact regardless.

Compaction Strategies

Synap offers four compaction strategies that control the tradeoff between compression ratio and information preservation.
Recommended. Dynamically adjusts compression based on the content. Dense, fact-heavy conversations are compressed less aggressively; repetitive or low-information conversations are compressed more aggressively.
  • Typical compression: 30-60% of original tokens
  • Preserves: all high-confidence facts, decisions, preferences, key narrative flow
  • Drops: repetitive information, low-value conversational filler, redundant context
compaction = await sdk.conversation.context.compact(
    conversation_id="3f6b1a2c-4d5e-6f7a-8b9c-0d1e2f3a4b5c",
    strategy="adaptive"
)

Strategy Selection Guide

ScenarioRecommended Strategy
General-purpose, unsure what to useadaptive
Very long conversations (100+ turns)aggressive with target_tokens
Important conversations, high-value contextconservative
Moderate conversations, cost optimizationbalanced
Dynamic workload with varying conversation lengthsadaptive

CompactionResponse

The compact() method returns a CompactionResponse with comprehensive details about the compaction result.
compacted_context
str
required
The compressed context as a single string, ready for injection into an LLM prompt.
original_token_count
int
required
The token count of the original, uncompacted context.
compacted_token_count
int
required
The token count of the compacted result.
compression_ratio
float
required
The ratio of compacted to original tokens (e.g., 0.35 means 35% of original size).
strategy_used
str
required
The strategy that was actually applied. May differ from the requested strategy if adaptive selected a different approach.
validation_score
float
required
A score between 0.0 and 1.0 indicating how well the compacted context preserves the original information. Scores above 0.8 are considered high-quality.
validation_passed
bool
required
Whether the compaction passed Synap’s quality validation threshold.
facts
List[str]
Key facts preserved in the compacted context.
decisions
List[str]
Decisions and action items preserved in the compacted context.
preferences
List[str]
User preferences preserved in the compacted context.
summary
str
A narrative summary of the conversation, if generated by the compaction strategy.
quality_warning
str
A warning message if the compaction quality is below expected thresholds. Present only when there are concerns about information loss.
Pay attention to the quality_warning field. When present, it indicates that the compaction may have lost important information. Consider using a less aggressive strategy or increasing target_tokens if quality warnings appear consistently.

Retrieving Compacted Context

Use sdk.conversation.context.get_compacted() to retrieve a previously compacted version of a conversation’s context without triggering a new compaction.
compacted = await sdk.conversation.context.get_compacted(
    conversation_id="3f6b1a2c-4d5e-6f7a-8b9c-0d1e2f3a4b5c",
    version=None,            # None = latest version
    include_extractions=True,
    include_narrative=True,
    format="structured"
)

if compacted:
    print(f"Compacted context: {compacted.compacted_context[:200]}...")
    print(f"Compression ratio: {compacted.compression_ratio:.0%}")
else:
    print("No compaction exists for this conversation yet.")

Parameter Reference

conversation_id
str
required
The conversation to retrieve compacted context for.
version
int
Specific compaction version to retrieve. Each time a conversation is compacted, the version increments. Omit or pass None to get the latest version.
include_extractions
bool
default:"True"
Whether to include structured extractions (facts, decisions, preferences) in the response.
include_narrative
bool
default:"True"
Whether to include the narrative summary in the response.
format
str
default:"structured"
Output format. "structured" returns a full CompactionResponse object. "injection-ready" returns a pre-formatted string optimized for direct injection into an LLM system prompt.

Checking Compaction Status

Use sdk.conversation.context.get_compaction_status() to check the current compaction state of a conversation without retrieving the full compacted content.
status = await sdk.conversation.context.get_compaction_status(
    conversation_id="3f6b1a2c-4d5e-6f7a-8b9c-0d1e2f3a4b5c"
)

print(f"Has compaction: {status['has_compaction']}")
print(f"Is stale: {status['is_stale']}")
print(f"Current version: {status['version']}")
print(f"Compression ratio: {status['compression_ratio']}")
print(f"Validation score: {status['validation_score']}")
print(f"Last compacted at: {status['last_compacted_at']}")
print(f"Memories since compaction: {status['memories_since_compaction']}")
The status response includes:
FieldTypeDescription
has_compactionboolWhether any compaction exists for this conversation
is_staleboolWhether new memories have been ingested since the last compaction
versionintCurrent compaction version number
compression_ratiofloatRatio of the current compaction
validation_scorefloatQuality score of the current compaction
last_compacted_atdatetimeWhen the last compaction was performed
memories_since_compactionintNumber of new memories ingested since last compaction

Full Examples

Simple: Get Context for Prompt

The recommended approach for most integrations. One call per LLM turn.
async def chat_with_memory(conversation_id: str, user_message: str):
    """Chat using compacted memory context."""

    # Get compacted context (cached locally, safe to call every turn)
    compacted = await sdk.conversation.context.get_context_for_prompt(
        conversation_id=conversation_id,
        style="structured"
    )

    # Also fetch query-specific context for this message
    recent = await sdk.conversation.context.fetch(
        conversation_id=conversation_id,
        search_query=[user_message],
        max_results=5,
        mode="fast"
    )

    recent_facts = "\n".join(
        f"- {fact.content}" for fact in recent.facts if fact.confidence >= 0.8
    )

    # Build system prompt
    history_section = ""
    if compacted.available:
        history_section = f"""
## Conversation History (Compacted)
{compacted.formatted_context}
"""
        if compacted.is_stale:
            # Optionally trigger re-compaction in the background
            await sdk.conversation.context.compact(
                conversation_id=conversation_id,
                strategy="adaptive"
            )

    system_prompt = f"""You are a helpful assistant with memory of past conversations.
{history_section}
## Recently Relevant Facts
{recent_facts if recent_facts else "None specifically relevant to this query."}

Use this context to personalize your responses."""

    # Pass to your LLM
    # response = await llm.generate(system_prompt, user_message)
    return system_prompt

Advanced: Manual Compaction Control

Use this approach when you need explicit control over compaction strategy and timing.
async def get_optimized_context(conversation_id: str, token_budget: int = 2000) -> str:
    """Get compacted context for a conversation, re-compacting if stale."""

    # Check current compaction status
    status = await sdk.conversation.context.get_compaction_status(
        conversation_id=conversation_id
    )

    if status["has_compaction"] and not status["is_stale"]:
        # Existing compaction is fresh, retrieve it
        compacted = await sdk.conversation.context.get_compacted(
            conversation_id=conversation_id,
            format="injection-ready"
        )
        if compacted:
            return compacted.compacted_context

    # Either no compaction exists or it is stale -- re-compact
    compaction = await sdk.conversation.context.compact(
        conversation_id=conversation_id,
        strategy="adaptive",
        target_tokens=token_budget
    )

    if compaction.quality_warning:
        print(f"Compaction quality warning: {compaction.quality_warning}")

    print(f"Compacted {compaction.original_token_count} tokens → "
          f"{compaction.compacted_token_count} tokens "
          f"({compaction.compression_ratio:.0%} ratio, "
          f"validation: {compaction.validation_score:.2f})")

    return compaction.compacted_context


async def chat_with_compacted_memory(conversation_id: str, user_message: str):
    """Chat using compacted memory context."""

    # Get optimized context within token budget
    memory_context = await get_optimized_context(
        conversation_id=conversation_id,
        token_budget=2000
    )

    # Also fetch recent high-relevance context for this specific query
    recent = await sdk.conversation.context.fetch(
        conversation_id=conversation_id,
        search_query=[user_message],
        max_results=5,
        mode="fast"
    )

    # Build system prompt with both compacted history and recent context
    recent_facts = "\n".join(
        f"- {fact.content}" for fact in recent.facts if fact.confidence >= 0.8
    )

    system_prompt = f"""You are a helpful assistant with memory of past conversations.

## Conversation History (Compacted)
{memory_context}

## Recently Relevant Facts
{recent_facts if recent_facts else "None specifically relevant to this query."}

Use this context to personalize your responses."""

    # Pass to your LLM
    # response = await llm.generate(system_prompt, user_message)
    return system_prompt

Best Practices

For most integrations, get_context_for_prompt() is the right choice. It returns prompt-ready context in a single call, handles local caching automatically, and includes staleness and quality metadata. Only reach for the manual methods (compact(), get_compacted(), get_compaction_status()) when you need explicit control over strategy or token budgets.
Use compacted context for broad historical context and conversation.context.fetch() for query-specific recent context. This gives your LLM both a comprehensive history and targeted relevant details. Both examples above demonstrate this pattern.
The adaptive strategy automatically selects the right compression level based on content density. It is the safest default for most applications and handles a wide range of conversation lengths and content types.
If your LLM has a 128k token context window and your system prompt uses ~2k tokens plus the user message, you might allocate 4-8k tokens for compacted memory context. Use target_tokens to enforce this budget.
Use get_compaction_status() to avoid unnecessary re-compaction. Only compact when is_stale is True, meaning new memories have been added since the last compaction. This saves processing time and API calls. If you are using get_context_for_prompt(), the is_stale field on the response serves the same purpose.
Track validation_score over time. Consistently low scores (below 0.7) may indicate that your conversations contain highly diverse topics that do not compress well. Consider switching to conservative strategy or increasing target_tokens.
When quality_warning is present, log it and consider falling back to a less aggressive strategy. You can implement an automatic fallback pattern:
compaction = await sdk.conversation.context.compact(
    conversation_id=conv_id,
    strategy="balanced"
)

if compaction.quality_warning:
    # Retry with less compression
    compaction = await sdk.conversation.context.compact(
        conversation_id=conv_id,
        strategy="conservative",
        force=True
    )

Next Steps

Retrieving Memories

Retrieve context to combine with compacted history.

Ingesting Memories

Ingest new data that triggers compaction staleness.

Context Compaction Concepts

Deep dive into compaction algorithms and architecture.

SDK Configuration

Configure timeouts and retries for compaction operations.