Runtime Ingestion - Maximem Synap

Unlike bootstrap ingestion, which handles bulk historical data, runtime ingestion is designed for low-latency, non-blocking operation. Your agent never waits for ingestion to complete before responding to the user.

Runtime ingestion is asynchronous. The SDK call returns immediately with an ingestion_id. The pipeline processes the content in the background, and memories become available for retrieval within seconds (fast mode) to minutes (long-range mode).

How it works

The runtime ingestion flow integrates naturally into your agent’s conversation loop:

Your agent receives a user message

The user sends a message to your agent through your application’s interface — a chat widget, API, mobile app, or other channel.

Your agent retrieves context from Synap

Before generating a response, the agent calls Synap to fetch relevant memories and context. This step is covered in detail in Agent Interactions.

Your agent generates and delivers a response

The agent calls an LLM with the retrieved context and conversation history, generates a response, and sends it to the user.

Your agent ingests the conversation turn

After the response is delivered, the agent calls sdk.memories.create() to ingest the conversation turn. This call returns immediately — it does not block the user experience.

Synap processes in the background

The ingestion pipeline extracts entities, resolves references, detects preferences, and stores structured memories. These memories become available for future retrieval queries.

The SDK call

The core SDK method for runtime ingestion is sdk.memories.create():

from maximem_synap import MaximemSynapSDK

sdk = MaximemSynapSDK(api_key="your_api_key")

result = await sdk.memories.create(
    document="User: Can you remind me what we decided about the migration timeline?\n"
             "Assistant: In our last conversation, you and the team agreed to begin the "
             "database migration on March 15th, with a two-week buffer for testing.",
    document_type="ai-chat-conversation",
    user_id="user_123",
    customer_id="acme_corp",
    mode="fast"
)

print(f"Ingestion ID: {result.ingestion_id}")
# Returns immediately -- processing happens in the background

Parameters

Parameter	Required	Description
`document`	Yes	The text content to ingest. For conversations, include both user and assistant messages with speaker labels.
`document_type`	Yes	The type of content being ingested. Determines how the pipeline processes the document.
`user_id`	No	The user this content belongs to. Determines user-scope storage.
`customer_id`	No	The customer organization. Determines customer-scope storage.
`mode`	No	Ingestion mode: `fast` or `long-range`. Defaults to `long-range`.
`document_id`	No	Unique identifier for idempotency. Prevents duplicate ingestion on retry.
`document_created_at`	No	Timestamp override. Defaults to current time for runtime ingestion (usually correct).

Document types

Synap supports a range of document types, each with specialized extraction logic:

Document Type	Description	Extraction Focus
`ai-chat-conversation`	Chat conversations between a user and an AI agent. The most common type for runtime ingestion.	Speaker identification, preference detection, decision tracking, entity extraction
`document`	General text documents, articles, or notes.	Topic extraction, entity extraction, key facts
`email`	Email content including headers and body.	Sender/recipient extraction, action items, references
`pdf`	PDF document content (text extracted).	Structured content extraction, section awareness
`image`	Image descriptions or OCR-extracted text.	Visual entity extraction, scene understanding
`audio`	Transcribed audio content.	Speaker diarization, topic segmentation
`meeting-transcript`	Meeting transcriptions with multiple speakers.	Action items, decisions, attendee tracking, topic flow

For most agent applications, you will use ai-chat-conversation almost exclusively during runtime. Other document types are more common in bootstrap ingestion or specialized pipelines.

Ingestion modes

Runtime ingestion supports two processing modes that control the depth of extraction:

Fast mode
Long-range mode

Performs basic chunking, lightweight entity extraction, and vector embedding. Processing completes in seconds. Best for real-time chat logging where speed matters more than extraction depth.

await sdk.memories.create(
    document="User: What time is the standup?\nAssistant: Daily standup is at 9:30 AM Pacific.",
    document_type="ai-chat-conversation",
    user_id="user_123",
    customer_id="acme_corp",
    mode="fast"
)

Use fast mode for: routine conversations, high-throughput pipelines, non-critical context.

Runs the full extraction pipeline: deep entity resolution, relationship mapping, preference detection, emotional analysis, and advanced categorization. Processing takes seconds to minutes. Best for important conversations that should be thoroughly analyzed.

await sdk.memories.create(
    document="User: We need to revisit the Project Atlas timeline. Sarah from engineering "
             "said the Q3 deadline is not feasible. Can we push to Q4 and bring in two "
             "more engineers from the infrastructure team?\n"
             "Assistant: I'll note that. So the proposal is to shift Project Atlas to Q4 "
             "and augment the team with infrastructure engineers. Should I also note "
             "Sarah's concern about the original timeline?",
    document_type="ai-chat-conversation",
    user_id="user_123",
    customer_id="acme_corp",
    mode="long-range"
)

Use long-range mode for: strategic conversations, onboarding sessions, complex discussions with multiple entities and decisions.

For a detailed comparison, see Fast Mode and Accurate Mode.

Scoping

The user_id and customer_id parameters determine where the memory is stored in the scope hierarchy. This directly affects who can retrieve the memory later.

# User-scoped: only visible when retrieving for this specific user
await sdk.memories.create(
    document="User: I prefer bullet points over paragraphs.\nAssistant: Got it!",
    document_type="ai-chat-conversation",
    user_id="user_123",
    customer_id="acme_corp",
    mode="fast"
)

# Customer-scoped: visible to all users in the organization
await sdk.memories.create(
    document="Company handbook update: All PTO requests must be submitted two weeks in advance.",
    document_type="document",
    customer_id="acme_corp",
    mode="long-range"
)

# Client-scoped: visible to all users across all customers
await sdk.memories.create(
    document="Product changelog: Version 2.5 adds support for custom webhooks.",
    document_type="document",
    mode="fast"
)

For a full explanation of the scope hierarchy, see Memory Scopes.

The typical agent loop

Here is how runtime ingestion fits into a standard agent conversation loop:

from maximem_synap import MaximemSynapSDK
from openai import AsyncOpenAI

sdk = MaximemSynapSDK(api_key="synap_api_key")
openai_client = AsyncOpenAI(api_key="openai_api_key")

async def handle_message(user_message: str, user_id: str, customer_id: str, conversation_id: str):
    """Handle a single user message with memory-enabled context."""

    # Step 1: Retrieve relevant context from Synap
    context = await sdk.conversation.context.fetch(
        conversation_id=conversation_id,
        user_id=user_id,
        customer_id=customer_id,
        search_query=[user_message],
        mode="fast"
    )

    # Step 2: Build the prompt with retrieved memories
    system_prompt = (
        "You are a helpful assistant. Use the following context from previous "
        "conversations to inform your response:\n\n"
        f"{context.formatted_context}\n\n"
        "If the context is not relevant, respond based on your general knowledge."
    )

    # Step 3: Generate the response
    response = await openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_message}
        ]
    )

    assistant_message = response.choices[0].message.content

    # Step 4: Ingest the conversation turn (non-blocking)
    await sdk.memories.create(
        document=f"User: {user_message}\nAssistant: {assistant_message}",
        document_type="ai-chat-conversation",
        user_id=user_id,
        customer_id=customer_id,
        mode="fast"
    )

    return assistant_message

For a more detailed walkthrough of this pattern, see Agent Interactions.

Best practices

Include speaker labels in conversation content

Always format conversations with clear speaker labels (User: and Assistant:). The ingestion pipeline uses these labels to identify who said what, which is critical for accurate preference detection and entity attribution.

# Good: clear speaker labels
document = "User: I need the report by Friday.\nAssistant: I'll have it ready by Thursday evening."

# Bad: no speaker context
document = "I need the report by Friday. I'll have it ready by Thursday evening."

Use consistent user and customer IDs

Ensure that user_id and customer_id are consistent across all ingestion calls for the same user and organization. Inconsistent IDs fragment the memory store, creating isolated pockets of context that cannot be retrieved together. Derive these IDs from your application’s auth system.

Use fast mode for routine real-time chat

For standard conversational turns, fast mode provides the best balance of speed and extraction quality. Reserve long-range mode for high-value conversations where deep extraction justifies the additional processing time.

Ingest after the response is delivered

Always ingest the conversation turn after your agent has responded, not before. This ensures the ingested content includes both the user message and the agent’s response, providing complete context for future retrieval.

Handle ingestion errors gracefully

Runtime ingestion should never block or crash your agent. Wrap ingestion calls in error handling and log failures for later investigation. A missed ingestion is recoverable; a crashed agent is not.

try:
    await sdk.memories.create(
        document=conversation_turn,
        document_type="ai-chat-conversation",
        user_id=user_id,
        customer_id=customer_id,
        mode="fast"
    )
except Exception as e:
    logger.warning(f"Ingestion failed, will retry: {e}")
    # Queue for retry or log for manual follow-up

Set document_id for idempotent retries

If you implement retry logic for failed ingestion calls, always set a document_id to prevent duplicate memories. Derive the ID from your conversation or message identifier.

await sdk.memories.create(
    document=conversation_turn,
    document_type="ai-chat-conversation",
    document_id=f"turn_{conversation_id}_{turn_number}",
    user_id=user_id,
    customer_id=customer_id,
    mode="fast"
)

Next steps

Bootstrap Ingestion

Load historical data in bulk before or alongside runtime operation.

Agent Interactions

The full retrieve-generate-ingest loop for memory-enabled agents.

SDK Ingestion Guide

Detailed SDK reference for all ingestion methods and parameters.

Fast Mode

Understand the tradeoffs of fast vs long-range ingestion modes.

Documentation Index

​How it works

​The SDK call

​Parameters

​Document types

​Ingestion modes

​Scoping

​The typical agent loop

​Best practices

​Next steps

Bootstrap Ingestion

Agent Interactions

SDK Ingestion Guide

Fast Mode

How it works

The SDK call

Parameters

Document types

Ingestion modes

Scoping

The typical agent loop

Best practices

Next steps