Overview
As conversations grow, the amount of context you need to inject into your LLM’s prompt grows with them. Context compaction solves this by intelligently compressing conversation history while preserving the most important information — facts, decisions, preferences, and key narrative elements. Compaction is particularly valuable when:- Conversations exceed your LLM’s context window
- You want to reduce token costs without losing critical context
- You need to summarize long conversation histories into a concise prompt segment
| Approach | Method | Best for |
|---|---|---|
| Get context for prompt | get_context_for_prompt() | Most integrations — one call, prompt-ready context |
| Manual compaction control | compact() + get_compacted() + get_compaction_status() | Fine-grained control over strategy, token budgets, and timing |
Prerequisites: Recording Messages
Before calling any compaction method, the conversation must have messages recorded viasdk.conversation.record_message(). Calling compact() or get_context_for_prompt() on a conversation with no recorded messages returns HTTP 403.
conversation_id must be a valid UUID string. Non-UUID strings (e.g. "conv_abc123") will cause a ServiceUnavailableError on the backend. Use str(uuid.uuid4()) to generate one, or pass a UUID you already manage in your system.Quick Start: Get Context for Your Prompt
For most use cases,get_context_for_prompt() is all you need. It returns the best available compacted context in a single call, pre-formatted for injection into your LLM’s system prompt.
ContextForPromptResponse
Pre-formatted compacted context, ready for direct injection into an LLM system prompt.
None when no compacted context is available.Whether compacted context exists for this conversation. When
False, use context.fetch() for retrieval-based context or trigger compact() to create a compaction.True when the compacted context is older than 15 minutes. Stale context is still usable but may not reflect the latest conversation turns. Consider triggering a fresh compact() if staleness matters for your use case.Ratio of compacted to original tokens (e.g., 0.35 means 35% of original size).
Quality score between 0.0 and 1.0 indicating how well the compacted context preserves the original information.
Seconds since the compaction was created. Use this to implement custom staleness thresholds.
True when the validation score is below expected thresholds. When set, the compacted context may have lost important information.Formatting Styles
Thestyle parameter is accepted by the SDK ("structured", "narrative", "bullet_points") and will control output formatting in a future backend release. At this stage, all three values return the same raw message content — there is no formatting difference between them yet.
Style-based formatting differentiation is not yet active on the backend. You can safely pass any supported style value — it will take effect automatically once the backend support ships, with no code changes required on your side.
Handling Missing Context
Whenavailable is False, no compacted context exists yet. This happens for new conversations or conversations that haven’t been compacted. You have two options:
Manual Compaction Control
Use these methods when you need fine-grained control over compaction — choosing a strategy, setting token budgets, polling for completion, or retrieving specific versions.Triggering Compaction
Usesdk.conversation.context.compact() to explicitly trigger compaction with specific parameters.
Parameter Reference
The identifier of the conversation to compact.
The compaction strategy to apply. Controls how aggressively the context is compressed. See Strategies below. If omitted,
"adaptive" is used.The desired output size in tokens. When specified, the compactor adjusts its aggressiveness to hit this target as closely as possible. Takes priority over the strategy’s default compression level.
Skip staleness checks. By default, Synap only re-compacts a conversation if new memories have been ingested since the last compaction. Set
force=True to compact regardless.Compaction Strategies
Synap offers four compaction strategies that control the tradeoff between compression ratio and information preservation.- adaptive
- aggressive
- balanced
- conservative
Recommended. Dynamically adjusts compression based on the content. Dense, fact-heavy conversations are compressed less aggressively; repetitive or low-information conversations are compressed more aggressively.
- Typical compression: 30-60% of original tokens
- Preserves: all high-confidence facts, decisions, preferences, key narrative flow
- Drops: repetitive information, low-value conversational filler, redundant context
Strategy Selection Guide
| Scenario | Recommended Strategy |
|---|---|
| General-purpose, unsure what to use | adaptive |
| Very long conversations (100+ turns) | aggressive with target_tokens |
| Important conversations, high-value context | conservative |
| Moderate conversations, cost optimization | balanced |
| Dynamic workload with varying conversation lengths | adaptive |
CompactionResponse
Thecompact() method returns a CompactionResponse with comprehensive details about the compaction result.
The compressed context as a single string, ready for injection into an LLM prompt.
The token count of the original, uncompacted context.
The token count of the compacted result.
The ratio of compacted to original tokens (e.g., 0.35 means 35% of original size).
The strategy that was actually applied. May differ from the requested strategy if
adaptive selected a different approach.A score between 0.0 and 1.0 indicating how well the compacted context preserves the original information. Scores above 0.8 are considered high-quality.
Whether the compaction passed Synap’s quality validation threshold.
Key facts preserved in the compacted context.
Decisions and action items preserved in the compacted context.
User preferences preserved in the compacted context.
A narrative summary of the conversation, if generated by the compaction strategy.
A warning message if the compaction quality is below expected thresholds. Present only when there are concerns about information loss.
Retrieving Compacted Context
Usesdk.conversation.context.get_compacted() to retrieve a previously compacted version of a conversation’s context without triggering a new compaction.
Parameter Reference
The conversation to retrieve compacted context for.
Specific compaction version to retrieve. Each time a conversation is compacted, the version increments. Omit or pass
None to get the latest version.Whether to include structured extractions (facts, decisions, preferences) in the response.
Whether to include the narrative summary in the response.
Output format.
"structured" returns a full CompactionResponse object. "injection-ready" returns a pre-formatted string optimized for direct injection into an LLM system prompt.Checking Compaction Status
Usesdk.conversation.context.get_compaction_status() to check the current compaction state of a conversation without retrieving the full compacted content.
| Field | Type | Description |
|---|---|---|
has_compaction | bool | Whether any compaction exists for this conversation |
is_stale | bool | Whether new memories have been ingested since the last compaction |
version | int | Current compaction version number |
compression_ratio | float | Ratio of the current compaction |
validation_score | float | Quality score of the current compaction |
last_compacted_at | datetime | When the last compaction was performed |
memories_since_compaction | int | Number of new memories ingested since last compaction |
Full Examples
Simple: Get Context for Prompt
The recommended approach for most integrations. One call per LLM turn.Advanced: Manual Compaction Control
Use this approach when you need explicit control over compaction strategy and timing.Best Practices
Start with get_context_for_prompt()
Start with get_context_for_prompt()
For most integrations,
get_context_for_prompt() is the right choice. It returns prompt-ready context in a single call, handles local caching automatically, and includes staleness and quality metadata. Only reach for the manual methods (compact(), get_compacted(), get_compaction_status()) when you need explicit control over strategy or token budgets.Combine compacted context with live retrieval
Combine compacted context with live retrieval
Use compacted context for broad historical context and
conversation.context.fetch() for query-specific recent context. This gives your LLM both a comprehensive history and targeted relevant details. Both examples above demonstrate this pattern.Use adaptive strategy as default
Use adaptive strategy as default
The
adaptive strategy automatically selects the right compression level based on content density. It is the safest default for most applications and handles a wide range of conversation lengths and content types.Set target_tokens based on your LLM's context window
Set target_tokens based on your LLM's context window
If your LLM has a 128k token context window and your system prompt uses ~2k tokens plus the user message, you might allocate 4-8k tokens for compacted memory context. Use
target_tokens to enforce this budget.Check is_stale before re-compacting
Check is_stale before re-compacting
Use
get_compaction_status() to avoid unnecessary re-compaction. Only compact when is_stale is True, meaning new memories have been added since the last compaction. This saves processing time and API calls. If you are using get_context_for_prompt(), the is_stale field on the response serves the same purpose.Monitor validation_score
Monitor validation_score
Track
validation_score over time. Consistently low scores (below 0.7) may indicate that your conversations contain highly diverse topics that do not compress well. Consider switching to conservative strategy or increasing target_tokens.Handle quality_warning gracefully
Handle quality_warning gracefully
When
quality_warning is present, log it and consider falling back to a less aggressive strategy. You can implement an automatic fallback pattern:Next Steps
Retrieving Memories
Retrieve context to combine with compacted history.
Ingesting Memories
Ingest new data that triggers compaction staleness.
Context Compaction Concepts
Deep dive into compaction algorithms and architecture.
SDK Configuration
Configure timeouts and retries for compaction operations.