Documentation Index
Fetch the complete documentation index at: https://docs.maximem.ai/llms.txt
Use this file to discover all available pages before exploring further.
How short-term context builds
Every conversational turn adds to the short-term context. A “turn” consists of a user message and the corresponding assistant response. As the conversation progresses, the context window grows:The context window problem
Short-term context cannot grow indefinitely. There are three practical constraints:Token limits
Every LLM has a maximum context window. Once the conversation history exceeds this limit, older content must be dropped or compressed. Even with large context windows (100K+ tokens), filling them entirely with conversation history leaves little room for retrieved long-term memories and system instructions.
Cost scaling
LLM costs scale with input token count. A conversation with 50,000 tokens of history costs significantly more per turn than one with 2,000 tokens. For high-volume applications, unbounded context growth can make costs unsustainable.
Quality degradation
Research shows that LLMs pay less attention to content in the middle of long contexts (the “lost in the middle” phenomenon). Extremely long conversation histories can actually degrade response quality as the model struggles to identify the most relevant parts.
When compaction kicks in
Synap monitors the short-term context and triggers compaction when configurable thresholds are exceeded. You can configure these thresholds through the Memory Architecture Configuration:| Threshold | What it measures | Default | Configurable via |
|---|---|---|---|
| Token count | Total tokens in the conversation history | Varies by model | MACA retrieval.budget.max_tokens |
| Turn count | Number of conversational turns | Configurable | MACA retrieval settings |
| Cost threshold | Accumulated cost of context per conversation | Configurable | MACA retrieval settings |
What happens during compaction
Compaction is not simply truncating old messages. Synap performs intelligent compression that preserves essential information while reducing token count.Analyze the conversation
Synap examines the full conversation history to identify key information: facts established, decisions made, preferences expressed, action items created, and the current topic of discussion.
Extract essential information
Critical information is extracted into structured summaries. This includes:
- Active facts: information that is still relevant to the conversation
- Decisions made: choices or conclusions reached during the discussion
- Current state: what the user is currently working on or asking about
- Pending items: unanswered questions or open action items
- User preferences: communication style, format preferences expressed during the conversation
Compress the history
The original verbose conversation history is replaced with a compressed representation. Early turns that have been fully resolved are condensed into summaries. Recent turns (typically the last 3-5) are preserved verbatim to maintain immediate conversational flow.
Persist to long-term memory
Information extracted during compaction that has lasting value — facts, preferences, episodes, temporal events — is also routed through the ingestion pipeline to become long-term memories. This ensures that valuable knowledge from conversations is not lost when the short-term context is compressed.
Compaction in practice
Here is a simplified example of how compaction transforms a conversation:- Before compaction
- After compaction
Relationship to long-term memory
Short-term and long-term memory work together. During a conversation:- Long-term memories are retrieved at the start of each turn to provide background knowledge
- Short-term context provides immediate conversational continuity
- During compaction, extracted knowledge is routed to long-term storage
- After the conversation ends, remaining short-term context can be ingested as a conversation document, producing additional long-term memories
Next steps
Context Compaction
Technical deep dive into the compaction algorithm and configuration options.
Long-term Context
How persistent knowledge is stored and retrieved across sessions.
SDK: Context Compaction
Implement and configure context compaction in your application.
Memories & Context
Return to the overview of memories and context in Synap.