Skip to main content
Short-term context is like a person’s working memory during a meeting. They remember everything discussed so far in the meeting, but once the meeting ends, only the important takeaways persist as long-term memories.

How short-term context builds

Every conversational turn adds to the short-term context. A “turn” consists of a user message and the corresponding assistant response. As the conversation progresses, the context window grows:
Turn 1:  User: "What's our current API rate limit?"
         Assistant: "Your current rate limit is 1,000 requests per minute."
         Context size: ~50 tokens

Turn 2:  User: "Can we increase that for our enterprise plan?"
         Assistant: "Yes, enterprise plans support up to 10,000 req/min..."
         Context size: ~120 tokens

Turn 3:  User: "What about burst handling?"
         Assistant: "Burst allowances provide a 2x multiplier for 30 seconds..."
         Context size: ~200 tokens

  ...

Turn 25: Context size: ~4,000 tokens
Turn 50: Context size: ~8,000 tokens
Each turn is appended to the running conversation history. Your agent sees the full history on every subsequent turn, which is what gives it the ability to reference earlier parts of the conversation (“as I mentioned earlier…” or “going back to your question about rate limits…”).

The context window problem

Short-term context cannot grow indefinitely. There are three practical constraints:

Token limits

Every LLM has a maximum context window. Once the conversation history exceeds this limit, older content must be dropped or compressed. Even with large context windows (100K+ tokens), filling them entirely with conversation history leaves little room for retrieved long-term memories and system instructions.

Cost scaling

LLM costs scale with input token count. A conversation with 50,000 tokens of history costs significantly more per turn than one with 2,000 tokens. For high-volume applications, unbounded context growth can make costs unsustainable.

Quality degradation

Research shows that LLMs pay less attention to content in the middle of long contexts (the “lost in the middle” phenomenon). Extremely long conversation histories can actually degrade response quality as the model struggles to identify the most relevant parts.
This is where context compaction becomes essential.

When compaction kicks in

Synap monitors the short-term context and automatically triggers compaction when the conversation grows too large for efficient processing. You can influence when compaction occurs through the Memory Architecture Configuration by adjusting context budget settings. When compaction is triggered, Synap intelligently compresses the conversation so your agent can continue without losing important context.

What happens during compaction

Compaction is not simply truncating old messages. Synap intelligently compresses the conversation while preserving what matters — key facts, decisions, user preferences, and the current topic of discussion. Recent messages are kept intact so the conversational flow is not disrupted. Information with lasting value that is identified during compaction is automatically persisted to long-term memory, so it remains available in future sessions.
Compaction is a lossy process. While Synap preserves the most important information, some conversational nuance from early turns is inevitably lost. For applications where complete conversation fidelity is required, consider storing full transcripts separately and relying on Synap for the semantic layer.


Relationship to long-term memory

Short-term and long-term memory work together seamlessly. When your agent retrieves context from Synap, it receives both relevant long-term memories from past sessions and the current conversation history. During compaction, valuable knowledge from the current conversation is automatically saved for future sessions. This means your agent always has the full picture — what it knows from past interactions and what has been discussed in the current session — without any extra effort on your part.

Next steps

Context Compaction

Technical deep dive into the compaction algorithm and configuration options.

Long-term Context

How persistent knowledge is stored and retrieved across sessions.

SDK: Context Compaction

Implement and configure context compaction in your application.

Memories & Context

Return to the overview of memories and context in Synap.