Ingestion - Maximem Synap

Overview

Ingestion is how you feed data into Synap’s memory system. Every conversation, document, email, or transcript you send through sdk.memories.create() enters the ingestion pipeline where it is categorized, chunked, entities are extracted and resolved, and the result is persisted across Synap’s vector and graph storage engines. Ingestion is asynchronous. When you call create(), Synap immediately returns an ingestion_id that you can use to poll the processing status. This design allows high-throughput workloads without blocking your application.

Creating a Memory

Use sdk.memories.create() to send a single document into the ingestion pipeline.

from datetime import datetime

response = await sdk.memories.create(
    document="User: What's the status of Project Atlas?\nAssistant: Project Atlas is on track for Q2 launch...",
    document_type="ai-chat-conversation",
    user_id="user_12345",
    customer_id="cust_67890",
    mode="long-range",
    metadata={"session_id": "sess_abc", "agent_version": "2.1.0"}
)

print(f"Ingestion ID: {response.ingestion_id}")
print(f"Status: {response.status}")

Key parameters for the example above:

document: the raw content to ingest (full transcript with speaker labels for conversations).
document_type: tells the pipeline which extraction and chunking strategy to apply. See Document Types below.
mode: the depth of extraction. See Ingest Modes below.
user_id / customer_id: control the memory scope. The effective scope is derived from the combination you pass plus the instance’s B2C vs B2B configuration; see Memory Scopes.
document_id: an optional idempotency key. Resubmitting the same document_id updates the existing memory instead of creating a duplicate.

The call returns immediately with an ingestion_id (status "queued") that you poll via sdk.memories.status().

Full parameter reference →

Every parameter, the complete response shape, and the errors raised by memories.create().

Document Types

The document_type parameter tells the ingestion pipeline which extraction and chunking strategies to apply.

Document Type	Description	Optimized For
`ai-chat-conversation`	Multi-turn AI assistant conversations	Speaker turns, intent extraction, preference detection
`document`	General text documents	Paragraph chunking, topic extraction
`email`	Email messages and threads	Sender/recipient extraction, action items, thread context
`pdf`	PDF document content (text extracted)	Section-aware chunking, header/footer handling
`image`	Image descriptions or OCR text	Entity extraction from visual content descriptions
`audio`	Audio transcriptions	Speaker diarization awareness, temporal markers
`meeting-transcript`	Meeting transcription content	Multi-speaker extraction, action items, decisions

For image and audio types, you provide the text content (description, transcript, or OCR output), not the raw binary file. Media processing and transcription should be handled upstream of Synap.

Ingest Modes

Synap offers two ingestion modes that trade off processing depth against throughput.

fast

Optimized for speed. Performs basic chunking, lightweight entity extraction, and vector embedding. Skips deep relationship mapping and advanced categorization.

Lower processing latency than long-range
Best for: high-throughput pipelines, real-time chat logging, non-critical data

long-range

Optimized for quality. Runs the full extraction pipeline including deep entity resolution, relationship mapping, preference detection, emotional analysis, and graph storage.

Higher processing latency than fast, in exchange for deeper extraction
Best for: conversations, documents where deep extraction matters, building long-term user profiles

Use long-range mode for conversations and documents where deep extraction matters. Use fast for high-throughput scenarios where speed is critical. You can always re-ingest a document in long-range mode later by resubmitting with the same document_id.

Ingestion mode values (fast / long-range) are distinct from retrieval mode values (fast / accurate). They control different stages of the pipeline and are not interchangeable; passing "accurate" to memories.create() or "long-range" to context.fetch() will be rejected.

Code Examples

Ingesting a Conversation

response = await sdk.memories.create(
    document="""User: I'm planning a trip to Japan in April.
Assistant: Great choice! April is cherry blossom season in Japan. Would you like recommendations for Tokyo or Kyoto?
User: Both! I prefer boutique hotels over large chains, and I'm vegetarian.
Assistant: I'll keep your preference for boutique hotels and vegetarian dining in mind...""",
    document_type="ai-chat-conversation",
    user_id="user_12345",
    customer_id="cust_67890",
    mode="long-range"
)

Ingesting a Document

response = await sdk.memories.create(
    document="Q3 2025 Engineering OKRs\n\n1. Ship Synap SDK v2.0...",
    document_type="document",
    user_id="user_12345",
    customer_id="cust_67890",
    document_created_at=datetime(2025, 7, 1),
    metadata={"source": "confluence", "page_id": "12345"}
)

Ingesting with User and Customer Scoping

When both user_id and customer_id are provided, the memory is accessible at both scopes. This is useful when a user’s conversation may contain information relevant to the broader organization.

response = await sdk.memories.create(
    document="User: Our team decided to switch from Jira to Linear...",
    document_type="ai-chat-conversation",
    user_id="user_alice",
    customer_id="cust_acme_corp",
    mode="long-range"
)

# This memory is now retrievable via:
# - sdk.user.context.fetch() for user_alice
# - sdk.customer.context.fetch() for cust_acme_corp
# - sdk.conversation.context.fetch() for the conversation

Batch Ingestion

For bulk workloads, use sdk.memories.batch_create() to submit multiple documents in a single request.

from maximem_synap import CreateMemoryRequest

documents = [
    CreateMemoryRequest(
        document="User: Book me a flight to NYC next Tuesday...",
        document_type="ai-chat-conversation",
        user_id="user_12345",
        mode="long-range"
    ),
    CreateMemoryRequest(
        document="Meeting notes from sprint planning...",
        document_type="meeting-transcript",
        customer_id="cust_67890",
        mode="fast"
    ),
    CreateMemoryRequest(
        document="Support ticket: Login issues after password reset...",
        document_type="document",
        user_id="user_67890",
        customer_id="cust_67890",
        mode="long-range"
    ),
]

batch_response = await sdk.memories.batch_create(
    documents=documents,
    fail_fast=False
)

print(f"Submitted: {batch_response.total}")
print(f"Succeeded: {batch_response.succeeded}")
print(f"Failed: {batch_response.failed}")

for result in batch_response.results:
    print(f"  {result.ingestion_id}: {result.status}")

The `fail_fast` Option

batch_create() takes a fail_fast flag (default False). With False, all documents are processed and invalid ones are rejected individually while valid ones proceed; with True, the entire batch aborts if any document fails validation and no documents are ingested.

Full parameter reference →

The CreateMemoryRequest fields, fail_fast semantics, and the BatchCreateResponse shape.

Recording Conversation Messages

sdk.memories.create() ingests a fully-formed document. For live agents that need to stream messages turn-by-turn (so compaction and get_context_for_prompt() always see the latest history), use sdk.conversation.record_message() instead.

await sdk.conversation.record_message(
    conversation_id="b85f1c2a-9d3e-4f0a-8b6c-1a2b3c4d5e6f",  # must be a valid UUID
    role="user",                                              # "user" or "assistant"
    content="I'd like to upgrade my plan.",
    user_id="user_12345",
    customer_id="cust_67890",
)

A few things to get right for the example above:

conversation_id must be a valid UUID string (non-UUID values are rejected) and the same value should be reused across all turns of a single conversation.
role must be either "user" or "assistant".
content is subject to the per-message size limit listed in Performance & Limits.

The call returns a dict with message_id, conversation_id, session_id, and recorded_at.

Full parameter reference →

Every parameter, scope rules for customer_id, the response shape, and the errors raised.

Recording Messages in Batch

For backfills or buffered writes, record_messages_batch accepts a list of message dicts:

import uuid

# conversation_id must be a valid UUID string; reuse it across all turns
# of the same conversation.
conv_id = str(uuid.uuid4())

await sdk.conversation.record_messages_batch(
    messages=[
        {
            "conversation_id": conv_id,
            "role": "user",
            "content": "Hello!",
            "user_id": "user_12345",
            "customer_id": "cust_67890",
        },
        {
            "conversation_id": conv_id,
            "role": "assistant",
            "content": "Hi! How can I help?",
            "user_id": "user_12345",
            "customer_id": "cust_67890",
        },
    ]
)

Returns a dict with total, succeeded, failed, and a per-message results[] list.

Recording messages is separate from ingesting a memory via memories.create(). Recorded messages are the input to compaction (compact() and get_context_for_prompt() need them). Memories created via memories.create() are the long-term, extracted, scope-aware knowledge layer. Most production agents do both: stream every turn via record_message() for live compaction, and periodically ingest the full conversation via memories.create() for long-term recall.

When one call is enough (cost & dedup)

“Do both” is the common pattern, not a requirement: the two calls serve different layers and carry different cost. Both consume credits; see Pricing & Credits for the model and sdk.credits.estimate() to size an operation before you run it.

record_message() is lightweight. It appends a turn to the conversation buffer that feeds compaction; it does not run the full extraction pipeline per call. Streaming every turn is the intended usage.
memories.create() is the heavier call. It runs chunking, entity extraction/resolution, and vector + graph persistence. Calling it on every turn, rather than periodically on a fuller transcript, is the main source of avoidable double-write cost.

Pick based on what each conversation actually needs:

Live compaction only (you only need a prompt-ready rolling summary, no long-term cross-conversation recall): record_message() alone is enough. Skip memories.create().
Long-term recall only (batch transcripts, documents, backfills where you never call compaction): memories.create() alone is enough. Skip record_message().
Both (live agents that also need durable, scope-aware memory): stream turns with record_message(), then ingest the assembled conversation with memories.create() periodically (e.g. at session end or every N turns) rather than per turn.

To keep the long-term layer from accumulating duplicates when you re-ingest a growing transcript, pass a stable document_id so repeat submissions update the existing memory instead of creating a new one. Extraction-level dedup (deduplicating overlapping facts/preferences across submissions) is handled by the ingestion pipeline and by smart-merge on update(); the document_id idempotency key is what prevents whole-document duplicates.

Checking Ingestion Status

Ingestion is asynchronous. Use sdk.memories.status() to poll the processing state of a submitted document.

status = await sdk.memories.status(ingestion_id=response.ingestion_id)

print(f"Status: {status.status}")
print(f"Queued at: {status.queued_at}")
print(f"Started at: {status.started_at}")
print(f"Completed at: {status.completed_at}")

if status.status == "completed":
    print(f"Memory IDs: {status.memory_ids}")
    print(f"Memories created: {status.memories_created}")
elif status.status == "failed":
    print(f"Error: {status.error_message}")

Ingestion Statuses

Status	Description
`queued`	Document accepted and waiting for processing
`processing`	Actively being processed through the ingestion pipeline
`completed`	Successfully ingested. Memories are available for retrieval
`failed`	Processing failed. Check `error_message` for details
`partial_success`	Some extractions succeeded but others failed. Partial results are available

The partial_success status typically occurs with large documents where some chunks process successfully while others encounter extraction errors. The successfully processed portions are still available for retrieval.

Updating Memories

Update an existing memory using sdk.memories.update(). This is useful when the source document has been edited or when you want to append new information.

updated = await sdk.memories.update(
    memory_id=memory_id,
    document="Updated conversation transcript with additional turns...",
    merge_strategy="smart-merge",
    metadata={"updated_at": "2025-03-15", "reason": "new turns added"}
)

Merge Strategies

replace

Completely replaces the existing memory content with the new document. Previous extractions are discarded and re-extracted from the new content. Use when the document has been fully rewritten.

append

Adds the new content to the end of the existing memory. Previous extractions are preserved, and new extractions are generated only from the appended content. Use when adding new turns to a conversation.

smart-merge

Intelligently merges the new content with the existing memory. The pipeline detects overlapping sections, deduplicates extractions, and reconciles conflicting information by preferring the newer version. Use when the document has been partially edited.

Deleting Memories

Remove a memory and all its associated extractions permanently.

await sdk.memories.delete(memory_id=memory_id)

Deletion is permanent and irreversible. All extractions, entity associations, and graph relationships derived from this memory are removed. This operation cannot be undone.

Best Practices

Include speaker labels in conversations

Always prefix conversation turns with speaker labels (User:, Assistant:, or actual names). The ingestion pipeline uses these labels to correctly attribute preferences, facts, and intents to the right participant.

Set user_id and customer_id consistently

Use stable, deterministic identifiers for user_id and customer_id. These IDs form the basis of scoped retrieval. Inconsistent IDs fragment the user’s memory across multiple scopes.

Use document_id for idempotency

When re-ingesting content that may have been submitted before (e.g., webhook retries), always provide a document_id. This prevents duplicate memories and ensures updates are applied cleanly.

Provide document_created_at for historical data

When backfilling historical conversations or documents, set document_created_at to the original timestamp. This enables accurate temporal reasoning (“What did the user say last month?”).

Choose the right mode for the workload

Reserve long-range mode for content where deep understanding matters (user conversations, strategic documents). Use fast mode for high-volume, lower-priority data (automated logs, bulk imports).

Batch when possible

For bulk ingestion (backfills, migrations), use batch_create() with fail_fast=False. This reduces HTTP overhead and allows the pipeline to optimize scheduling.

Next Steps

Context Fetch

Query the memories you have ingested for contextual retrieval.

Entity Resolution

Understand how entities are automatically resolved during ingestion.

Context Compaction

Compress long conversations to reduce token costs.

SDK Configuration

Configure SDK behavior, timeouts, and retry policies.

​Overview

​Creating a Memory

Full parameter reference →

​Document Types

​Ingest Modes

fast

long-range

​Code Examples

​Ingesting a Conversation

​Ingesting a Document

​Ingesting with User and Customer Scoping

​Batch Ingestion

​The fail_fast Option

Full parameter reference →

​Recording Conversation Messages

Full parameter reference →

​Recording Messages in Batch

​When one call is enough (cost & dedup)

​Checking Ingestion Status

​Ingestion Statuses

​Updating Memories

​Merge Strategies

​Deleting Memories

​Best Practices

​Next Steps

Context Fetch

Entity Resolution

Context Compaction

SDK Configuration

Overview

Creating a Memory

Document Types

Ingest Modes

Code Examples

Ingesting a Conversation

Ingesting a Document

Ingesting with User and Customer Scoping

Batch Ingestion

The `fail_fast` Option

Recording Conversation Messages

Recording Messages in Batch

When one call is enough (cost & dedup)

Checking Ingestion Status

Ingestion Statuses

Updating Memories

Merge Strategies

Deleting Memories

Best Practices

Next Steps