Short-term Context - Maximem Synap

Short-term context is like a person’s working memory during a meeting. They remember everything discussed so far in the meeting, but once the meeting ends, only the important takeaways persist as long-term memories.

How short-term context builds

Every conversational turn adds to the short-term context. A “turn” consists of a user message and the corresponding assistant response. As the conversation progresses, the context window grows:

Turn 1:  User: "What's our current API rate limit?"
         Assistant: "Your current rate limit is 1,000 requests per minute."
         Context size: ~50 tokens

Turn 2:  User: "Can we increase that for our enterprise plan?"
         Assistant: "Yes, enterprise plans support up to 10,000 req/min..."
         Context size: ~120 tokens

Turn 3:  User: "What about burst handling?"
         Assistant: "Burst allowances provide a 2x multiplier for 30 seconds..."
         Context size: ~200 tokens

  ...

Turn 25: Context size: ~4,000 tokens
Turn 50: Context size: ~8,000 tokens

Each turn is appended to the running conversation history. Your agent sees the full history on every subsequent turn, which is what gives it the ability to reference earlier parts of the conversation (“as I mentioned earlier…” or “going back to your question about rate limits…”).

The context window problem

Short-term context cannot grow indefinitely. There are three practical constraints:

Token limits

Every LLM has a maximum context window. Once the conversation history exceeds this limit, older content must be dropped or compressed. Even with large context windows (100K+ tokens), filling them entirely with conversation history leaves little room for retrieved long-term memories and system instructions.

Cost scaling

LLM costs scale with input token count. A conversation with 50,000 tokens of history costs significantly more per turn than one with 2,000 tokens. For high-volume applications, unbounded context growth can make costs unsustainable.

Quality degradation

Research shows that LLMs pay less attention to content in the middle of long contexts (the “lost in the middle” phenomenon). Extremely long conversation histories can actually degrade response quality as the model struggles to identify the most relevant parts.

This is where context compaction becomes essential.

When compaction kicks in

Synap monitors the short-term context and triggers compaction when configurable thresholds are exceeded. You can configure these thresholds through the Memory Architecture Configuration:

Threshold	What it measures	Default	Configurable via
Token count	Total tokens in the conversation history	Varies by model	Configurable via your Memory Architecture Configuration
Turn count	Number of conversational turns	Configurable	Configurable via your Memory Architecture Configuration
Cost threshold	Accumulated cost of context per conversation	Configurable	Configurable via your Memory Architecture Configuration

When any threshold is exceeded, Synap initiates the compaction process.

What happens during compaction

Compaction is not simply truncating old messages. Synap performs intelligent compression that preserves essential information while reducing token count.

Analyze the conversation

Synap examines the full conversation history to identify key information: facts established, decisions made, preferences expressed, action items created, and the current topic of discussion.

Extract essential information

Critical information is extracted into structured summaries. This includes:

Active facts: information that is still relevant to the conversation
Decisions made: choices or conclusions reached during the discussion
Current state: what the user is currently working on or asking about
Pending items: unanswered questions or open action items
User preferences: communication style, format preferences expressed during the conversation

Compress the history

The original verbose conversation history is replaced with a compressed representation. Early turns that have been fully resolved are condensed into summaries. Recent turns (typically the last 3-5) are preserved verbatim to maintain immediate conversational flow.

Persist to long-term memory

Information extracted during compaction that has lasting value — facts, preferences, episodes, temporal events — is also routed through the ingestion pipeline to become long-term memories. This ensures that valuable knowledge from conversations is not lost when the short-term context is compressed.

Compaction is a lossy process. While Synap preserves the most important information, some conversational nuance and detail from early turns is inevitably lost. For applications where complete conversation fidelity is required, consider storing full transcripts separately and relying on Synap for the semantic layer.

Compaction in practice

Here is a simplified example of how compaction transforms a conversation:

Before compaction
After compaction

Turn 1: User asks about API rate limits
Turn 2: Assistant explains 1,000 req/min default
Turn 3: User asks about enterprise pricing
Turn 4: Assistant provides pricing tiers
Turn 5: User asks about SSO support
Turn 6: Assistant confirms SAML and OIDC support
Turn 7: User asks about data residency
Turn 8: Assistant explains EU and US region options
Turn 9: User asks about migration from competitor
Turn 10: Assistant outlines migration process
...
Turn 20: User asks a follow-up about migration timeline

Context size: ~6,000 tokens

[Conversation Summary]
- User is evaluating the platform for enterprise adoption
- Confirmed: enterprise rate limit 10K req/min, SAML+OIDC SSO
- Confirmed: EU and US data residency options available
- User is planning migration from a competitor product
- Migration process: export → transform → bulk import (2-4 weeks)
- Current pricing discussion: Enterprise tier at $X/month

[Recent turns preserved verbatim]
Turn 18: User: "How long does the data migration typically take?"
Turn 19: Assistant: "Migration typically takes 2-4 weeks..."
Turn 20: User: "Can we run both systems in parallel during migration?"

Context size: ~1,500 tokens

The compressed version retains all essential information while reducing context size by approximately 75%. The most recent turns are preserved verbatim for conversational continuity.

Relationship to long-term memory

Short-term and long-term memory work together. During a conversation:

Long-term memories are retrieved at the start of each turn to provide background knowledge
Short-term context provides immediate conversational continuity
During compaction, extracted knowledge is routed to long-term storage
After the conversation ends, remaining short-term context can be ingested as a conversation document, producing additional long-term memories

# During a conversation, both context types work together
context = await sdk.user.context.fetch(
    user_id="user_alice",
    customer_id="acme_corp",
    search_query=["migration timeline"],
    conversation_id="conv_abc123"  # Links to short-term context
)

# context includes:
# - Long-term memories: "Alice is evaluating enterprise plan" (from past sessions)
# - Short-term context: "Currently discussing migration from competitor" (this session)

This dual-memory architecture mirrors how human memory works: we draw on both what we remember from past experiences (long-term) and what we are currently thinking about (short-term) to formulate responses.

Next steps

Context Compaction

Technical deep dive into the compaction algorithm and configuration options.

Long-term Context

How persistent knowledge is stored and retrieved across sessions.

SDK: Context Compaction

Implement and configure context compaction in your application.

Memories & Context

Return to the overview of memories and context in Synap.

Documentation Index

​How short-term context builds

​The context window problem

Token limits

Cost scaling

Quality degradation

​When compaction kicks in

​What happens during compaction

​Compaction in practice

​Relationship to long-term memory

​Next steps

Context Compaction

Long-term Context

SDK: Context Compaction

Memories & Context

How short-term context builds

The context window problem

When compaction kicks in

What happens during compaction

Compaction in practice

Relationship to long-term memory

Next steps