Skip to main content
This page focuses on the integration pattern — how your agent uses Synap during conversations. For details on the ingestion pipeline itself, see Runtime Ingestion. For retrieval configuration, see Fast Mode and Accurate Mode.

The agent loop

The core pattern for a memory-enabled agent is a three-phase cycle: Retrieve, Generate, Ingest.
Agent loop: User sends message, agent retrieves context from Synap, generates response with LLM, sends response to user, ingests conversation turn into Synap, repeats
1

User sends a message

Your application receives a message from the user through whatever channel you support — a chat widget, API endpoint, mobile app, voice interface, or other integration.
2

Agent retrieves relevant context from Synap

Before calling the LLM, the agent queries Synap for memories relevant to the current message. This retrieval considers the user’s history, their organization’s shared knowledge, and any client-scoped information.
context = await sdk.conversation.context.fetch(
    user_id="user_123",
    customer_id="acme_corp",
    query=user_message,
    mode="fast"
)
3

Agent builds the prompt

The agent assembles the full prompt for the LLM: a system prompt, the retrieved memories as context, the recent conversation history, and the current user message. The retrieved context bridges the gap between what the LLM knows (nothing about this user) and what it needs to know.
4

Agent generates a response

The agent calls the LLM (OpenAI, Anthropic, or any provider) with the assembled prompt. The LLM generates a response that is informed by the user’s history and organizational context.
5

Agent delivers the response

The generated response is sent back to the user through your application’s interface.
6

Agent ingests the conversation turn

After the response is delivered, the agent sends the full conversation turn (user message + agent response) to Synap for ingestion. This is asynchronous and non-blocking — the user does not wait for ingestion to complete.
await sdk.memories.create(
    document=f"User: {user_message}\nAssistant: {assistant_response}",
    document_type="ai-chat-conversation",
    user_id="user_123",
    customer_id="acme_corp",
    mode="fast"
)
7

Loop repeats

When the user sends the next message, the cycle begins again. This time, the retrieval step may return memories from the conversation turn that was just ingested, creating a continuously improving feedback loop.

System prompt injection

The most critical part of the integration is how you structure the retrieved context within your LLM prompt. The retrieved memories need to be clearly separated from the system instructions and the conversation history so the LLM can use them effectively.
[System instructions]
  - Your agent's persona, capabilities, and behavioral guidelines

[Retrieved context from Synap]
  - Relevant facts, preferences, and historical context
  - Clearly labeled as "context from memory"

[Conversation history]
  - Recent messages in the current session

[Current user message]
  - The message being responded to

Implementation

def build_prompt(system_instructions: str, context, conversation_history: list, user_message: str):
    """Build the full prompt with retrieved memories injected."""

    messages = [
        {
            "role": "system",
            "content": (
                f"{system_instructions}\n\n"
                "## Context from memory\n"
                "The following information has been retrieved from previous conversations "
                "and documents. Use it to personalize your response and maintain continuity "
                "across interactions. If the context is not relevant to the current question, "
                "do not force it into your response.\n\n"
                f"{context.formatted_context}"
            )
        }
    ]

    # Add recent conversation history
    for msg in conversation_history:
        messages.append({
            "role": msg["role"],
            "content": msg["content"]
        })

    # Add the current user message
    messages.append({
        "role": "user",
        "content": user_message
    })

    return messages
Include a brief instruction telling the LLM how to use the retrieved context. Phrases like “Use this to personalize your response” and “If the context is not relevant, do not force it” help the LLM apply memories appropriately without hallucinating connections.

When to ingest

There are three common strategies for when to ingest conversation data, each with different tradeoffs:
Ingest each conversation turn (user message + agent response) immediately after the response is delivered. This is the most common pattern.Pros:
  • Memories are available for retrieval within the same conversation session
  • No risk of data loss if the session ends unexpectedly
  • Fine-grained temporal resolution
Cons:
  • Higher API call volume
  • Each turn is ingested independently, without full conversation context
# After each response
await sdk.memories.create(
    document=f"User: {user_message}\nAssistant: {response}",
    document_type="ai-chat-conversation",
    user_id=user_id,
    customer_id=customer_id,
    mode="fast"
)

Streaming considerations

If your agent uses streaming responses (returning tokens as they are generated), ingest the conversation turn only after the full response has been assembled. Do not ingest partial responses.
async def handle_streaming_message(user_message: str, user_id: str, customer_id: str):
    """Handle a streaming response with post-stream ingestion."""

    context = await sdk.conversation.context.fetch(
        user_id=user_id,
        customer_id=customer_id,
        query=user_message,
        mode="fast"
    )

    messages = build_prompt(SYSTEM_PROMPT, context, conversation_history, user_message)

    # Stream the response to the user
    full_response = ""
    async for chunk in await openai_client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        stream=True
    ):
        token = chunk.choices[0].delta.content or ""
        full_response += token
        yield token  # Stream to user in real-time

    # Ingest AFTER the full response is complete
    await sdk.memories.create(
        document=f"User: {user_message}\nAssistant: {full_response}",
        document_type="ai-chat-conversation",
        user_id=user_id,
        customer_id=customer_id,
        mode="fast"
    )
Never ingest a partial or streaming response. Synap expects complete content to produce high-quality memories. Partial content produces fragmented, low-quality results.

Retrieval performance

Retrieval is on the critical path of your agent’s response time. The user is waiting while your agent fetches context from Synap. Choosing the right retrieval mode helps balance speed and quality.
ModeSpeedBest For
fastLow latency, imperceptible to usersReal-time conversations, single-topic queries
accurateModerate latency, noticeable but acceptableComplex queries, relationship-aware context, analytical summaries
For most real-time conversational agents, use fast mode for retrieval. The latency is imperceptible to users and adds minimal overhead to the overall response time. Reserve accurate mode for use cases where retrieval quality matters more than speed — for example, generating end-of-day summaries or answering complex analytical questions.
# Fast retrieval for real-time chat (recommended default)
context = await sdk.conversation.context.fetch(
    user_id=user_id,
    customer_id=customer_id,
    query=user_message,
    mode="fast"
)

# Accurate retrieval for complex queries
context = await sdk.conversation.context.fetch(
    user_id=user_id,
    customer_id=customer_id,
    query="Summarize everything we know about Project Atlas, including all team members and key decisions",
    mode="accurate"
)

Multi-agent scenarios

Multiple agents can share a single Synap Instance, each ingesting and retrieving from the same memory store. This enables sophisticated multi-agent architectures where specialized agents handle different aspects of a user’s needs while sharing a unified memory.

Shared memory across agents

# Sales agent ingests a conversation
await sdk.memories.create(
    document="User: We're evaluating your enterprise plan for 500 users.\n"
             "Assistant: Great! The enterprise plan includes SSO, priority support, "
             "and custom integrations. I'll send over a detailed proposal.",
    document_type="ai-chat-conversation",
    user_id="user_123",
    customer_id="acme_corp",
    mode="fast"
)

# Support agent later retrieves this context automatically
context = await sdk.conversation.context.fetch(
    user_id="user_123",
    customer_id="acme_corp",
    query="I have a question about setting up SSO"
)
# context includes: "User is evaluating enterprise plan for 500 users, interested in SSO"
In this pattern, the support agent automatically knows about the sales conversation because both agents share the same Instance and scope. The support agent can provide more informed assistance without the user having to repeat context.
Multi-agent memory sharing is controlled by scope. If both agents use the same user_id and customer_id, they share memories for that user and organization. If they operate on the same Instance, they share Instance-level memories. See Memory Scopes for details.

Full example: a memory-enabled chatbot

This complete example shows a production-ready memory-enabled chatbot using OpenAI, with context retrieval, response generation, and conversation ingestion:
import asyncio
from synap import Synap
from openai import AsyncOpenAI

sdk = Synap(api_key="synap_api_key")
openai_client = AsyncOpenAI(api_key="openai_api_key")

SYSTEM_PROMPT = """You are a helpful customer success assistant. You remember previous
conversations and use that context to provide personalized, continuity-aware responses.

Guidelines:
- Reference relevant past interactions naturally, without being repetitive.
- If you recall a user's preference, apply it without asking again.
- If the retrieved context contradicts something the user just said, trust the
  user's current statement (people change their minds).
- Never fabricate memories. If you don't have context, say so honestly."""


async def chat(
    user_message: str,
    user_id: str,
    customer_id: str,
    conversation_history: list[dict]
) -> str:
    """Handle a single chat turn with full memory integration."""

    # 1. Retrieve relevant context from Synap
    context = await sdk.conversation.context.fetch(
        user_id=user_id,
        customer_id=customer_id,
        query=user_message,
        mode="fast"
    )

    # 2. Build the prompt with memory context
    messages = [
        {
            "role": "system",
            "content": (
                f"{SYSTEM_PROMPT}\n\n"
                "## Retrieved context from memory\n"
                "Use the following information from previous interactions to inform "
                "your response. Do not reference this section directly -- integrate "
                "the knowledge naturally.\n\n"
                f"{context.formatted_context if context.formatted_context else 'No relevant context found.'}"
            )
        }
    ]

    # Add conversation history (recent turns in this session)
    for msg in conversation_history[-10:]:  # Keep last 10 turns for context window
        messages.append({"role": msg["role"], "content": msg["content"]})

    # Add the current user message
    messages.append({"role": "user", "content": user_message})

    # 3. Generate the response
    response = await openai_client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        temperature=0.7,
        max_tokens=1024
    )

    assistant_message = response.choices[0].message.content

    # 4. Ingest the conversation turn (non-blocking, fire-and-forget)
    try:
        await sdk.memories.create(
            document=f"User: {user_message}\nAssistant: {assistant_message}",
            document_type="ai-chat-conversation",
            user_id=user_id,
            customer_id=customer_id,
            mode="fast"
        )
    except Exception as e:
        # Log but do not fail the response
        print(f"Warning: ingestion failed: {e}")

    return assistant_message


# Usage
async def main():
    conversation_history = []

    while True:
        user_input = input("You: ")
        if user_input.lower() in ("quit", "exit"):
            break

        response = await chat(
            user_message=user_input,
            user_id="user_123",
            customer_id="acme_corp",
            conversation_history=conversation_history
        )

        # Update local conversation history
        conversation_history.append({"role": "user", "content": user_input})
        conversation_history.append({"role": "assistant", "content": response})

        print(f"Assistant: {response}")

if __name__ == "__main__":
    asyncio.run(main())

Best practices

Even if you think the current query does not need historical context, always call retrieval. Synap’s ranking ensures that irrelevant context is not returned, so the overhead is minimal. Skipping retrieval means your agent cannot benefit from accumulated memory.
The LLM does not need to know how Synap works internally. Simply tell it to use the retrieved context naturally and to not fabricate memories. Over-engineering the memory instructions can cause the LLM to behave unnaturally.
Log retrieval and generation latencies independently. If your agent feels slow, retrieval in fast mode is rarely the bottleneck — LLM generation is usually the dominant factor. Measuring both independently helps you optimize the right layer.
When your application supports multi-turn conversations, pass a consistent conversation_id alongside the user_id. This helps Synap group related turns for better contextual understanding during retrieval.
Your agent should still function if Synap is temporarily unreachable. Skip the retrieval step and generate a response without memory context. The user experience degrades (no personalization) but does not break entirely.
try:
    context = await sdk.conversation.context.fetch(...)
except Exception:
    context = None  # Proceed without memory context

Next steps

Context Fetch SDK

Detailed SDK reference for the context retrieval methods.

Ingestion SDK

Full reference for ingestion methods and parameters.

Fast Mode

Understand fast mode for latency-sensitive retrieval and ingestion.

Memory Scopes

How scopes affect what your agent can retrieve.