Agent Interactions - Maximem Synap

This page focuses on the integration pattern — how your agent uses Synap during conversations. For details on the ingestion pipeline itself, see Runtime Ingestion. For retrieval configuration, see Fast Mode and Accurate Mode.

The agent loop

The core pattern for a memory-enabled agent is a three-phase cycle: Retrieve, Generate, Ingest.

Agent loop: User sends message, agent retrieves context from Synap, generates response with LLM, sends response to user, ingests conversation turn into Synap, repeats

User sends a message

Your application receives a message from the user through whatever channel you support — a chat widget, API endpoint, mobile app, voice interface, or other integration.

Agent retrieves relevant context from Synap

Before calling the LLM, the agent queries Synap for memories relevant to the current message. This retrieval considers the user’s history, their organization’s shared knowledge, and any client-scoped information.

context = await sdk.conversation.context.fetch(
    conversation_id="conv_123",
    user_id="user_123",
    customer_id="acme_corp",
    search_query=[user_message],
    mode="fast"
)

Agent builds the prompt

The agent assembles the full prompt for the LLM: a system prompt, the retrieved memories as context, the recent conversation history, and the current user message. The retrieved context bridges the gap between what the LLM knows (nothing about this user) and what it needs to know.

Agent generates a response

The agent calls the LLM (OpenAI, Anthropic, or any provider) with the assembled prompt. The LLM generates a response that is informed by the user’s history and organizational context.

Agent delivers the response

The generated response is sent back to the user through your application’s interface.

Agent ingests the conversation turn

After the response is delivered, the agent sends the full conversation turn (user message + agent response) to Synap for ingestion. This is asynchronous and non-blocking — the user does not wait for ingestion to complete.

await sdk.memories.create(
    document=f"User: {user_message}\nAssistant: {assistant_response}",
    document_type="ai-chat-conversation",
    user_id="user_123",
    customer_id="acme_corp",
    mode="fast"
)

Loop repeats

When the user sends the next message, the cycle begins again. This time, the retrieval step may return memories from the conversation turn that was just ingested, creating a continuously improving feedback loop.

System prompt injection

The most critical part of the integration is how you structure the retrieved context within your LLM prompt. The retrieved memories need to be clearly separated from the system instructions and the conversation history so the LLM can use them effectively.

Recommended prompt structure

[System instructions]
  - Your agent's persona, capabilities, and behavioral guidelines

[Retrieved context from Synap]
  - Relevant facts, preferences, and historical context
  - Clearly labeled as "context from memory"

[Conversation history]
  - Recent messages in the current session

[Current user message]
  - The message being responded to

Implementation

def build_prompt(system_instructions: str, context, conversation_history: list, user_message: str):
    """Build the full prompt with retrieved memories injected."""

    messages = [
        {
            "role": "system",
            "content": (
                f"{system_instructions}\n\n"
                "## Context from memory\n"
                "The following information has been retrieved from previous conversations "
                "and documents. Use it to personalize your response and maintain continuity "
                "across interactions. If the context is not relevant to the current question, "
                "do not force it into your response.\n\n"
                f"{context.formatted_context}"
            )
        }
    ]

    # Add recent conversation history
    for msg in conversation_history:
        messages.append({
            "role": msg["role"],
            "content": msg["content"]
        })

    # Add the current user message
    messages.append({
        "role": "user",
        "content": user_message
    })

    return messages

Include a brief instruction telling the LLM how to use the retrieved context. Phrases like “Use this to personalize your response” and “If the context is not relevant, do not force it” help the LLM apply memories appropriately without hallucinating connections.

When to ingest

There are three common strategies for when to ingest conversation data, each with different tradeoffs:

After every turn
At conversation end
Hybrid approach

Ingest each conversation turn (user message + agent response) immediately after the response is delivered. This is the most common pattern.Pros:

Memories are available for retrieval within the same conversation session
No risk of data loss if the session ends unexpectedly
Fine-grained temporal resolution

Cons:

Higher API call volume
Each turn is ingested independently, without full conversation context

# After each response
await sdk.memories.create(
    document=f"User: {user_message}\nAssistant: {response}",
    document_type="ai-chat-conversation",
    user_id=user_id,
    customer_id=customer_id,
    mode="fast"
)

Accumulate the full conversation and ingest it as a single document when the session ends.Pros:

Fewer API calls
Full conversation context available for extraction — better entity resolution and relationship mapping
More efficient for long-range mode processing

Cons:

Memories from this session are not available during the session itself
Risk of data loss if the session terminates unexpectedly (crash, timeout)
Requires session lifecycle management

# At conversation end
full_transcript = "\n".join(
    f"{'User' if msg['role'] == 'user' else 'Assistant'}: {msg['content']}"
    for msg in conversation_history
)

await sdk.memories.create(
    document=full_transcript,
    document_type="ai-chat-conversation",
    user_id=user_id,
    customer_id=customer_id,
    mode="long-range"
)

Ingest each turn in fast mode for immediate availability, then ingest the full conversation in long-range mode at session end for deeper extraction.Pros:

Immediate availability of basic memories
Deep extraction from the full conversation context
Resilient to unexpected session termination

Cons:

Higher API call volume and processing cost
Requires deduplication logic (use document_id to handle overlapping content)

# During conversation: fast mode per turn
await sdk.memories.create(
    document=f"User: {user_message}\nAssistant: {response}",
    document_type="ai-chat-conversation",
    document_id=f"turn_{session_id}_{turn_number}",
    user_id=user_id,
    customer_id=customer_id,
    mode="fast"
)

# At conversation end: long-range mode for the full transcript
await sdk.memories.create(
    document=full_transcript,
    document_type="ai-chat-conversation",
    document_id=f"session_{session_id}_full",
    user_id=user_id,
    customer_id=customer_id,
    mode="long-range"
)

Streaming considerations

If your agent uses streaming responses (returning tokens as they are generated), ingest the conversation turn only after the full response has been assembled. Do not ingest partial responses.

async def handle_streaming_message(user_message: str, user_id: str, customer_id: str):
    """Handle a streaming response with post-stream ingestion."""

    context = await sdk.conversation.context.fetch(
        conversation_id="conv_123",
        user_id=user_id,
        customer_id=customer_id,
        search_query=[user_message],
        mode="fast"
    )

    messages = build_prompt(SYSTEM_PROMPT, context, conversation_history, user_message)

    # Stream the response to the user
    full_response = ""
    async for chunk in await openai_client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        stream=True
    ):
        token = chunk.choices[0].delta.content or ""
        full_response += token
        yield token  # Stream to user in real-time

    # Ingest AFTER the full response is complete
    await sdk.memories.create(
        document=f"User: {user_message}\nAssistant: {full_response}",
        document_type="ai-chat-conversation",
        user_id=user_id,
        customer_id=customer_id,
        mode="fast"
    )

Never ingest a partial or streaming response. The ingestion pipeline expects complete content to perform accurate entity extraction and relationship mapping. Partial content produces fragmented, low-quality memories.

The hot path: retrieval latency

Retrieval is on the critical path of your agent’s response time. The user is waiting while your agent fetches context from Synap. Minimizing retrieval latency is essential for a good user experience.

Mode	Typical Latency	Search Method	Best For
`fast`	Lower latency	Vector + graph, no LLM query decomposition	Real-time conversations, single-topic queries
`accurate`	Higher latency	Vector + graph + LLM subquery decomposition + reranking	Complex queries, relationship-aware context

For most real-time conversational agents, use fast mode for retrieval. Fast mode returns quickly and adds minimal overhead to the overall response time. Reserve accurate mode for use cases where retrieval quality matters more than speed — for example, generating end-of-day summaries or answering complex analytical questions.

# Fast retrieval for real-time chat (recommended default)
context = await sdk.conversation.context.fetch(
    conversation_id=conversation_id,
    user_id=user_id,
    customer_id=customer_id,
    search_query=[user_message],
    mode="fast"
)

# Accurate retrieval for complex queries
context = await sdk.conversation.context.fetch(
    conversation_id=conversation_id,
    user_id=user_id,
    customer_id=customer_id,
    search_query=["Summarize everything we know about Project Atlas, including all team members and key decisions"],
    mode="accurate"
)

Multi-agent scenarios

Multiple agents can share memory through the scope chain (User -> Customer -> Client), each ingesting and retrieving against the same scopes. This enables sophisticated multi-agent architectures where specialized agents handle different aspects of a user’s needs while sharing unified memory. An Instance is infrastructure — it is not itself a memory scope; what agents share is determined by the scope IDs they use, not by the Instance.

This section covers the runtime loop for sharing memory between agents. For the architecture decision — shared scopes vs. separate Instances vs. agent teams — see Multi-Agent Memory Architecture.

Shared memory across agents

# Sales agent ingests a conversation
await sdk.memories.create(
    document="User: We're evaluating your enterprise plan for 500 users.\n"
             "Assistant: Great! The enterprise plan includes SSO, priority support, "
             "and custom integrations. I'll send over a detailed proposal.",
    document_type="ai-chat-conversation",
    user_id="user_123",
    customer_id="acme_corp",
    mode="fast"
)

# Support agent later retrieves this context automatically
context = await sdk.conversation.context.fetch(
    conversation_id="conv_123",
    user_id="user_123",
    customer_id="acme_corp",
    search_query=["I have a question about setting up SSO"]
)
# context includes: "User is evaluating enterprise plan for 500 users, interested in SSO"

In this pattern, the support agent automatically knows about the sales conversation because both agents address the same scope (the same user_id and customer_id). The support agent can provide more informed assistance without the user having to repeat context.

Multi-agent memory sharing works at the scope level. If both agents use the same user_id and customer_id, they share user- and customer-scoped memories. Agents that share only the broader scopes — for example, no user_id/customer_id — still see client-scoped memories. Sharing is governed by the scope chain (User -> Customer -> Client), not by the Instance, which is infrastructure rather than a memory scope.

Full example: a memory-enabled chatbot

This complete example shows a production-ready memory-enabled chatbot using OpenAI, with context retrieval, response generation, and conversation ingestion:

import asyncio
from maximem_synap import MaximemSynapSDK
from openai import AsyncOpenAI

sdk = MaximemSynapSDK(api_key="synap_api_key")
openai_client = AsyncOpenAI(api_key="openai_api_key")

SYSTEM_PROMPT = """You are a helpful customer success assistant. You remember previous
conversations and use that context to provide personalized, continuity-aware responses.

Guidelines:
- Reference relevant past interactions naturally, without being repetitive.
- If you recall a user's preference, apply it without asking again.
- If the retrieved context contradicts something the user just said, trust the
  user's current statement (people change their minds).
- Never fabricate memories. If you don't have context, say so honestly."""


async def chat(
    user_message: str,
    user_id: str,
    customer_id: str,
    conversation_id: str,
    conversation_history: list[dict]
) -> str:
    """Handle a single chat turn with full memory integration."""

    # 1. Retrieve relevant context from Synap
    context = await sdk.conversation.context.fetch(
        conversation_id=conversation_id,
        user_id=user_id,
        customer_id=customer_id,
        search_query=[user_message],
        mode="fast"
    )

    # 2. Build the prompt with memory context
    messages = [
        {
            "role": "system",
            "content": (
                f"{SYSTEM_PROMPT}\n\n"
                "## Retrieved context from memory\n"
                "Use the following information from previous interactions to inform "
                "your response. Do not reference this section directly -- integrate "
                "the knowledge naturally.\n\n"
                f"{context.formatted_context if context.formatted_context else 'No relevant context found.'}"
            )
        }
    ]

    # Add conversation history (recent turns in this session)
    for msg in conversation_history[-10:]:  # Keep last 10 turns for context window
        messages.append({"role": msg["role"], "content": msg["content"]})

    # Add the current user message
    messages.append({"role": "user", "content": user_message})

    # 3. Generate the response
    response = await openai_client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        temperature=0.7,
        max_tokens=1024
    )

    assistant_message = response.choices[0].message.content

    # 4. Ingest the conversation turn (non-blocking, fire-and-forget)
    try:
        await sdk.memories.create(
            document=f"User: {user_message}\nAssistant: {assistant_message}",
            document_type="ai-chat-conversation",
            user_id=user_id,
            customer_id=customer_id,
            mode="fast"
        )
    except Exception as e:
        # Log but do not fail the response
        print(f"Warning: ingestion failed: {e}")

    return assistant_message


# Usage
async def main():
    conversation_history = []

    while True:
        user_input = input("You: ")
        if user_input.lower() in ("quit", "exit"):
            break

        response = await chat(
            user_message=user_input,
            user_id="user_123",
            customer_id="acme_corp",
            conversation_id="conv_123",
            conversation_history=conversation_history
        )

        # Update local conversation history
        conversation_history.append({"role": "user", "content": user_input})
        conversation_history.append({"role": "assistant", "content": response})

        print(f"Assistant: {response}")

if __name__ == "__main__":
    asyncio.run(main())

Best practices

Always retrieve before generating

Even if you think the current query does not need historical context, always call retrieval. Synap’s ranking ensures that irrelevant context is not returned, so the overhead is minimal. Skipping retrieval means your agent cannot benefit from accumulated memory.

Keep system prompt instructions about memory concise

The LLM does not need to know how Synap works internally. Simply tell it to use the retrieved context naturally and to not fabricate memories. Over-engineering the memory instructions can cause the LLM to behave unnaturally.

Separate retrieval latency from generation latency

Log retrieval and generation latencies independently. If your agent feels slow, retrieval (low-latency in fast mode) is rarely the bottleneck — LLM generation is usually the dominant factor. Measuring both independently helps you optimize the right layer.

Use conversation_id for multi-turn tracking

When your application supports multi-turn conversations, pass a consistent conversation_id alongside the user_id. This helps Synap group related turns for better contextual understanding during retrieval.

Gracefully degrade when Synap is unavailable

Your agent should still function if Synap is temporarily unreachable. Skip the retrieval step and generate a response without memory context. The user experience degrades (no personalization) but does not break entirely.

try:
    context = await sdk.conversation.context.fetch(...)
except Exception:
    context = None  # Proceed without memory context

Next steps

Context Fetch SDK

Detailed SDK reference for the context retrieval methods.

Ingestion SDK

Full reference for ingestion methods and parameters.

Fast Mode

Understand fast mode for latency-sensitive retrieval and ingestion.

Memory Scopes

How scopes affect what your agent can retrieve.

Documentation Index

​The agent loop

​System prompt injection

​Recommended prompt structure

​Implementation

​When to ingest

​Streaming considerations

​The hot path: retrieval latency

​Multi-agent scenarios

​Shared memory across agents

​Full example: a memory-enabled chatbot

​Best practices

​Next steps

Context Fetch SDK

Ingestion SDK

Fast Mode

Memory Scopes

The agent loop

System prompt injection

Recommended prompt structure

Implementation

When to ingest

Streaming considerations

The hot path: retrieval latency

Multi-agent scenarios

Shared memory across agents

Full example: a memory-enabled chatbot

Best practices

Next steps