How Ingestion Works - Maximem Synap

Your application produces content all the time: live conversations, support tickets, product docs, CRM records. Ingestion is the process that turns that raw content into structured, retrievable memory. There are two paths into Synap, and they share the same underlying pipeline:

Runtime ingestion feeds content in as it is generated during live agent interactions, one turn at a time.
Bootstrap ingestion loads pre-existing data in bulk: historical conversations, documentation, migrations from another system.

Both paths converge on the same processing pipeline. The difference is how you call them and what defaults make sense for each.

The ingestion pipeline

Every document you send, via either path, flows through the same stages before it becomes a memory you can retrieve:

Categorization

The pipeline reads the document’s document_type and selects the appropriate extraction logic. A chat conversation, an email, and a PDF are each processed differently.

Extraction

Synap analyzes the content to pull out the things worth remembering: facts, decisions, preferences, action items, and the entities involved. The depth of this step depends on the ingestion mode (see below).

Chunking

Larger documents are segmented into coherent units so that retrieval can return precise, relevant passages rather than whole documents.

Entity resolution

Extracted entities (people, projects, organizations) are matched against entities already in the store, so references to “Sarah” or “Project Atlas” link to the same entity across many documents. See Entity Resolution.

Storage

The resulting structured memories are written to the correct scope, indexed for both vector and graph retrieval, and made available to future queries.

Ingestion is asynchronous on both paths. The SDK call returns quickly with an identifier, and processing continues in the background. Memories become available for retrieval once the pipeline finishes. Fast mode completes sooner than long-range mode.

Ingestion modes

A single parameter, mode, controls how deeply the extraction stage analyzes each document. The same two modes are available on both ingestion paths.

fast
long-range

Performs lightweight chunking, entity extraction, and vector embedding. Best when speed matters more than extraction depth: routine conversational turns, high-throughput logging, non-critical context.

The ingestion mode you pick here is distinct from the retrieval mode you pick when querying. For how depth maps to query behavior (fast = vector + graph; accurate = vector + graph + LLM subquery decomposition + reranking), see Retrieval Modes.

Document types

document_type tells the pipeline what kind of content it is looking at and which extraction logic to apply. Both paths accept the same set:

Document Type	Description	Extraction Focus
`ai-chat-conversation`	Chat between a user and an AI agent. The most common type for runtime ingestion.	Speaker identification, preference detection, decision tracking, entity extraction
`document`	General text documents, articles, or notes.	Topic extraction, entity extraction, key facts
`email`	Email content including headers and body.	Sender/recipient extraction, action items, references
`pdf`	PDF document content (text extracted).	Structured content extraction, section awareness
`image`	Image descriptions or OCR-extracted text.	Visual entity extraction, scene understanding
`audio`	Transcribed audio content.	Speaker diarization, topic segmentation
`meeting-transcript`	Meeting transcriptions with multiple speakers.	Action items, decisions, attendee tracking, topic flow

During runtime you will use ai-chat-conversation almost exclusively. The other types show up most often in bootstrap loads and specialized pipelines.

Scoping every document

user_id and customer_id determine where a memory is stored in the scope hierarchy, which in turn controls who can retrieve it later. On B2C agents, customer_id is resolved automatically and you only pass user_id; on B2B agents you pass both. The same scoping rules apply on both ingestion paths. See Memory Scopes for the full hierarchy.

Runtime ingestion

Runtime ingestion feeds content into Synap as it is generated during live agent interactions. This is the primary ingestion path for most applications. After each conversation turn (or at the end of a conversation) your application calls sdk.memories.create() to send the turn through the pipeline. The call returns immediately, so your agent never waits for ingestion before responding to the user; fast is the natural default here.

Your agent receives a user message

Through a chat widget, API, mobile app, or other channel.

Your agent retrieves context from Synap

Before responding, the agent fetches relevant memories. See Context End to End.

Your agent generates and delivers a response

It calls an LLM with the retrieved context and conversation history, then replies to the user.

Your agent ingests the turn

After the response is delivered, the agent calls sdk.memories.create(). This returns immediately and does not block the user experience.

Synap processes in the background

The pipeline extracts, resolves, and stores structured memories, which become available for future retrieval.

The SDK call

import uuid
from maximem_synap import MaximemSynapSDK

sdk = MaximemSynapSDK(api_key="your_api_key")

result = await sdk.memories.create(
    document="User: Can you remind me what we decided about the migration timeline?\n"
             "Assistant: In our last conversation, you and the team agreed to begin the "
             "database migration on March 15th, with a two-week buffer for testing.",
    document_type="ai-chat-conversation",
    user_id="user_123",
    customer_id="acme_corp",  # B2B only, auto-resolved on B2C
    mode="fast",
)

print(f"Ingestion ID: {result.ingestion_id}")
# Returns immediately, processing happens in the background

Parameter	Required	Description
`document`	Yes	The text to ingest. For conversations, include both user and assistant messages with speaker labels.
`document_type`	Yes	The type of content. Determines how the pipeline processes the document.
`user_id`	No	The user this content belongs to. Determines user-scope storage.
`customer_id`	No	The customer organization (B2B only; auto-resolved on B2C). Determines customer-scope storage.
`mode`	No	`fast` or `long-range`. Defaults to `long-range`; runtime callers typically pass `fast`.
`document_id`	No	Unique identifier for idempotency. Prevents duplicate ingestion on retry.
`document_created_at`	No	Timestamp override. Defaults to the current time for runtime ingestion (usually correct).

Fitting it into the agent loop

import uuid
from maximem_synap import MaximemSynapSDK
from openai import AsyncOpenAI

sdk = MaximemSynapSDK(api_key="synap_api_key")
openai_client = AsyncOpenAI(api_key="openai_api_key")

# conversation_id must be a valid UUID
conversation_id = str(uuid.uuid4())

async def handle_message(user_message: str, user_id: str, customer_id: str, conversation_id: str):
    """Handle a single user message with memory-enabled context."""

    # 1. Retrieve relevant context from Synap
    context = await sdk.conversation.context.fetch(
        conversation_id=conversation_id,
        user_id=user_id,
        customer_id=customer_id,  # B2B only, auto-resolved on B2C
        search_query=[user_message],
        mode="fast",
    )

    # 2. Build the prompt with retrieved memories
    system_prompt = (
        "You are a helpful assistant. Use the following context from previous "
        "conversations to inform your response:\n\n"
        f"{context.formatted_context}\n\n"
        "If the context is not relevant, respond based on your general knowledge."
    )

    # 3. Generate the response
    response = await openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_message},
        ],
    )
    assistant_message = response.choices[0].message.content

    # 4. Ingest the conversation turn (non-blocking)
    await sdk.memories.create(
        document=f"User: {user_message}\nAssistant: {assistant_message}",
        document_type="ai-chat-conversation",
        user_id=user_id,
        customer_id=customer_id,
        mode="fast",
    )

    return assistant_message

For the full retrieve-generate-ingest pattern, see Context End to End.

Include speaker labels in conversation content

Always format conversations with clear User: and Assistant: labels. The pipeline uses them to identify who said what, which is critical for accurate preference detection and entity attribution.

# Good: clear speaker labels
document = "User: I need the report by Friday.\nAssistant: I'll have it ready by Thursday evening."

# Bad: no speaker context
document = "I need the report by Friday. I'll have it ready by Thursday evening."

Use consistent user and customer IDs

Keep user_id (and customer_id on B2B) consistent across all calls for the same user and organization. Inconsistent IDs fragment the store into isolated pockets of context that cannot be retrieved together. Derive them from your auth system.

Ingest after the response is delivered

Ingest the turn after your agent has responded, not before, so the ingested content includes both the user message and the agent’s reply: complete context for future retrieval.

Handle ingestion errors gracefully

Runtime ingestion should never block or crash your agent. Wrap calls in error handling and log failures. A missed ingestion is recoverable; a crashed agent is not.

try:
    await sdk.memories.create(
        document=conversation_turn,
        document_type="ai-chat-conversation",
        user_id=user_id,
        customer_id=customer_id,
        mode="fast",
    )
except Exception as e:
    logger.warning(f"Ingestion failed, will retry: {e}")
    # Queue for retry or log for manual follow-up

Set document_id for idempotent retries

If you retry failed ingestion calls, set a document_id to prevent duplicate memories. Derive it from your conversation or message identifier.

Bootstrap ingestion

Bootstrap ingestion loads pre-existing data into Synap in bulk. Before your agent goes live (or alongside live operation) you often need to seed it with historical context: past conversations, product documentation, knowledge base articles, customer records. You call sdk.memories.batch_create() with many documents at once. Because this data is historical and processed in the background, long-range is the natural default. Use bootstrap ingestion whenever you need to load a significant volume of existing data:

Migrating from another system: a custom memory solution, a competing product, or an in-house knowledge base.
Loading historical conversations: past chat logs, support tickets, or email threads.
Seeding product documentation: docs, FAQs, help center articles, internal wikis.
Backfilling customer data: CRM records, customer profiles, organizational context.
Populating shared knowledge: company policies, SOPs, reference material at customer or client scope.

Bootstrap loads run at BOOTSTRAP priority in the ingestion queue, which processes below real-time but above maintenance tasks. Your live agent keeps operating normally while historical data is processed. You do not need to finish bootstrapping before going live. The two paths can run simultaneously without competing for resources.

The batch ingestion method

sdk.memories.batch_create() accepts multiple documents in a single call, reducing per-call overhead and enabling server-side throughput optimizations. Each document is a CreateMemoryRequest supporting the same fields as sdk.memories.create().

from maximem_synap import MaximemSynapSDK, CreateMemoryRequest

sdk = MaximemSynapSDK(api_key="your_api_key")

result = await sdk.memories.batch_create(
    documents=[
        CreateMemoryRequest(
            document="User: How do I reset my password?\nAssistant: Go to Settings > Security > Reset Password.",
            document_type="ai-chat-conversation",
            document_id="migration_001",
            document_created_at="2024-03-15T10:30:00Z",
            user_id="user_456",
            customer_id="acme_corp",  # B2B only, auto-resolved on B2C
            mode="long-range",
        ),
        CreateMemoryRequest(
            document="User: What integrations do you support?\nAssistant: We support Slack, Jira, and GitHub.",
            document_type="ai-chat-conversation",
            document_id="migration_002",
            document_created_at="2024-03-16T14:20:00Z",
            user_id="user_789",
            customer_id="acme_corp",
            mode="long-range",
        ),
    ],
    fail_fast=False,
)

print(f"Succeeded: {result.succeeded}")
print(f"Failed: {result.failed}")

Parameter	Required	Description
`documents`	Yes	List of `CreateMemoryRequest` objects to ingest (max 100 per call).
`fail_fast`	No	If `True`, the whole batch fails on the first error. If `False` (default), errors are collected and returned alongside successful results.

Each CreateMemoryRequest supports the same fields as a single create call, including document_id for idempotency and document_created_at for temporal accuracy.

Key considerations

Preserve original timestamps

Always set document_created_at to the document’s original creation time. Without it, Synap defaults to the ingestion time, which distorts temporal ordering. If a user later asks “What did we discuss last March?”, accurate timestamps are essential for correct retrieval.

Use document IDs for idempotency

Assign a unique document_id to every document, ideally derived from your source system’s primary key (e.g. migration_{source_id}). If a batch is interrupted, you can safely retry it: already-ingested documents are skipped, preventing duplicates and making it easy to trace memories back to their source.

Use long-range mode for historical data

Bootstrap data benefits from thorough extraction. long-range (the default for batch) performs deep entity resolution, relationship mapping, and preference detection. The extra processing time is fine for background loads.

Validate and organize before loading

Clean your historical data first: drop empty conversations, strip PII you should not store, ensure timestamps are ISO 8601, and confirm each document carries the correct scope. Incorrect scoping is difficult to fix later: you would have to re-ingest. Start with a small test batch and verify scoping, timestamps, and entity resolution before the full load.

Monitor ingestion progress

For large loads, track progress with sdk.memories.status():

status = await sdk.memories.status(ingestion_id=result.results[0].ingestion_id)

print(f"Status: {status.status}")              # queued | processing | completed | failed
print(f"Memories created: {status.memories_created}")
print(f"Error: {status.error_message}")

Keep concurrency modest. While the API accepts many parallel batch requests, excessive concurrency causes queue backpressure and slows processing for all ingestion types. Add a short delay between batches during the initial bulk load.

Full example: loading historical conversations

This loads historical conversations from a database, handling pagination, error recovery, and progress monitoring:

import asyncio
from maximem_synap import MaximemSynapSDK, CreateMemoryRequest

sdk = MaximemSynapSDK(api_key="your_api_key")

BATCH_SIZE = 100

async def load_historical_conversations(db_connection):
    """Load historical conversations from a database into Synap."""

    conversations = await db_connection.fetch(
        "SELECT id, content, user_id, customer_id, created_at "
        "FROM conversations ORDER BY created_at ASC"
    )

    total = len(conversations)
    processed = failed = 0
    ingestion_ids = []

    for i in range(0, total, BATCH_SIZE):
        batch = conversations[i : i + BATCH_SIZE]

        documents = [
            CreateMemoryRequest(
                document=conv["content"],
                document_type="ai-chat-conversation",
                document_id=f"migration_{conv['id']}",
                document_created_at=conv["created_at"].isoformat(),
                user_id=conv["user_id"],
                customer_id=conv["customer_id"],  # B2B only, auto-resolved on B2C
                mode="long-range",
            )
            for conv in batch
        ]

        try:
            result = await sdk.memories.batch_create(documents=documents, fail_fast=False)
            processed += result.succeeded
            failed += result.failed
            ingestion_ids.extend(r.ingestion_id for r in result.results)
            print(f"Progress: {processed}/{total} ingested, {failed} failed")

            for r in result.results:
                if r.status == "failed":
                    print(f"  Error: {r.error_message}")

        except Exception as e:
            print(f"Batch request failed: {e}")
            # Safe to retry, document_id ensures idempotency
            failed += len(batch)

        # Brief pause between batches to avoid queue backpressure
        await asyncio.sleep(1)

    print(f"\nBootstrap complete: {processed} ingested, {failed} failed")
    return ingestion_ids

Runtime vs bootstrap

Both paths share the pipeline, modes, document types, and scoping rules above. They differ in how you invoke them and which defaults fit:

	Runtime	Bootstrap
Trigger	Live agent interactions, per turn	Bulk / backfill loads
Method	`sdk.memories.create()`	`sdk.memories.batch_create()`
Default mode	`long-range`, but typically called with `fast`	`long-range`
Throughput	One document per call, non-blocking	Many documents per call, `BOOTSTRAP`-priority queue
Best for	Real-time conversation logging	Historical data, migrations, seeding docs and knowledge

Next steps

Retrieval Modes

How fast and accurate retrieval differ, and why long-range is the bootstrap default.

Context End to End

The full retrieve-generate-ingest loop for memory-enabled agents.

Memory Types

What the pipeline produces: the kinds of structured memory Synap stores.

Entity Resolution

How extracted entities are matched and linked across documents.

​The ingestion pipeline

​Ingestion modes

​Document types

​Scoping every document

​Runtime ingestion

​The SDK call

​Fitting it into the agent loop

​Bootstrap ingestion

​The batch ingestion method

​Key considerations

​Full example: loading historical conversations

​Runtime vs bootstrap

​Next steps

Retrieval Modes

Context End to End

Memory Types

Entity Resolution

The ingestion pipeline

Ingestion modes

Document types

Scoping every document

Runtime ingestion

The SDK call

Fitting it into the agent loop

Bootstrap ingestion

The batch ingestion method

Key considerations

Full example: loading historical conversations

Runtime vs bootstrap

Next steps