Skip to main content
Bootstrap ingestion uses the BOOTSTRAP priority in the ingestion queue, which ensures that bulk loads never block real-time ingestion. Your live agent continues to operate normally while historical data is processed in the background.

When to use bootstrap ingestion

Bootstrap ingestion is the right choice whenever you need to load a significant volume of existing data into Synap:
  • Migrating from another system: Moving from a custom memory solution, a competing product, or an in-house knowledge base to Synap.
  • Loading historical conversations: Importing past chat logs, support tickets, or email threads so your agent has context about previous interactions.
  • Seeding product documentation: Ingesting your product docs, FAQs, help center articles, and internal wikis to give your agent comprehensive product knowledge.
  • Backfilling customer data: Loading CRM records, customer profiles, and organizational context for existing customers.
  • Populating shared knowledge: Ingesting company policies, SOPs, and reference material at customer or client scope.
You do not need to finish bootstrap ingestion before your agent goes live. Bootstrap and runtime ingestion can run simultaneously. The BOOTSTRAP priority queue ensures they do not compete for resources.

The batch API

The batch API accepts multiple documents in a single request, reducing HTTP overhead and enabling server-side optimizations for throughput.

Endpoint

POST /v1/memories/batch

Request body

{
  "documents": [
    {
      "document": "Full text content of the first document...",
      "document_type": "ai-chat-conversation",
      "document_id": "conv_2024_001",
      "document_created_at": "2024-03-15T10:30:00Z",
      "user_id": "user_123",
      "customer_id": "acme_corp",
      "mode": "long-range"
    },
    {
      "document": "Full text content of the second document...",
      "document_type": "document",
      "document_id": "doc_kb_042",
      "document_created_at": "2024-01-20T08:00:00Z",
      "customer_id": "acme_corp",
      "mode": "long-range"
    }
  ],
  "fail_fast": false
}

Key parameters

ParameterRequiredDescription
documentsYesArray of document objects to ingest (max 100 per request)
fail_fastNoIf true, the entire batch fails on the first error. If false (default), errors are collected and returned alongside successful results
document_idNoUnique identifier for idempotency. Retrying with the same document_id will not create duplicates
document_created_atNoOriginal creation timestamp of the document. Critical for temporal accuracy
modeNoIngestion mode: fast or long-range. Defaults to long-range for batch

SDK usage

The Python SDK provides a convenient wrapper around the batch API:
from synap import Synap

sdk = Synap(api_key="your_api_key")

result = await sdk.memories.batch_create(
    documents=[
        {
            "document": "User: How do I reset my password?\nAssistant: Go to Settings > Security > Reset Password.",
            "document_type": "ai-chat-conversation",
            "document_id": "conv_2024_001",
            "document_created_at": "2024-03-15T10:30:00Z",
            "user_id": "user_456",
            "customer_id": "acme_corp",
            "mode": "long-range"
        },
        {
            "document": "User: What integrations do you support?\nAssistant: We support Slack, Jira, and GitHub.",
            "document_type": "ai-chat-conversation",
            "document_id": "conv_2024_002",
            "document_created_at": "2024-03-16T14:20:00Z",
            "user_id": "user_789",
            "customer_id": "acme_corp",
            "mode": "long-range"
        }
    ],
    fail_fast=False
)

print(f"Ingestion ID: {result.ingestion_id}")
print(f"Accepted: {result.accepted_count}")
print(f"Rejected: {result.rejected_count}")

Key considerations

1

Preserve original timestamps

Always set document_created_at to the original creation timestamp of the document. Without this, Synap defaults to the ingestion time, which distorts temporal ordering. If a user asks “What did we discuss last March?”, accurate timestamps are essential for correct retrieval.
{
    "document": "Conversation from last year...",
    "document_created_at": "2024-03-15T10:30:00Z"  # Original timestamp
}
2

Use document IDs for idempotency

Assign a unique document_id to every document. If a batch request is interrupted or times out, you can safely retry the entire batch. Documents with IDs that have already been ingested will be skipped, preventing duplicates.
{
    "document": "...",
    "document_id": "conv_2024_001"  # Will not create a duplicate on retry
}
3

Use long-range mode for historical data

Bootstrap data typically benefits from thorough extraction. Use long-range mode (the default for batch) to perform deep entity resolution, relationship mapping, and preference detection. The extra processing time is acceptable for batch loads since they run in the background.
4

Monitor ingestion progress

For large batch loads, monitor progress using the status endpoint:
status = await sdk.memories.status(ingestion_id=result.ingestion_id)

print(f"State: {status.state}")         # queued | processing | completed | failed
print(f"Progress: {status.progress}%")
print(f"Documents processed: {status.processed_count}")
print(f"Errors: {status.error_count}")

Performance and throughput

The batch API is optimized for high-throughput ingestion:
AspectDetail
Max documents per request100
Max document size100 KB per document
Queue priorityBOOTSTRAP — processes below real-time but above maintenance tasks
ConcurrencyMultiple batch requests can run in parallel
Processing orderDocuments within a batch are processed in submission order
IdempotencySafe to retry — duplicates are detected by document_id
Avoid sending more than 10 concurrent batch requests. While the API accepts them, excessive concurrency can lead to queue backpressure and increased processing latency for all ingestion types.

Full example: loading historical conversations

This example loads 1000 historical conversations from a database into Synap, handling pagination, error recovery, and progress monitoring:
import asyncio
from synap import Synap

sdk = Synap(api_key="your_api_key")

BATCH_SIZE = 100

async def load_historical_conversations(db_connection):
    """Load historical conversations from a database into Synap."""

    # Fetch conversations from your existing database
    conversations = await db_connection.fetch(
        "SELECT id, content, user_id, customer_id, created_at "
        "FROM conversations ORDER BY created_at ASC"
    )

    total = len(conversations)
    processed = 0
    failed = 0
    ingestion_ids = []

    # Process in batches of 100
    for i in range(0, total, BATCH_SIZE):
        batch = conversations[i : i + BATCH_SIZE]

        documents = [
            {
                "document": conv["content"],
                "document_type": "ai-chat-conversation",
                "document_id": f"migration_{conv['id']}",
                "document_created_at": conv["created_at"].isoformat(),
                "user_id": conv["user_id"],
                "customer_id": conv["customer_id"],
                "mode": "long-range"
            }
            for conv in batch
        ]

        try:
            result = await sdk.memories.batch_create(
                documents=documents,
                fail_fast=False
            )

            processed += result.accepted_count
            failed += result.rejected_count
            ingestion_ids.append(result.ingestion_id)

            print(f"Progress: {processed}/{total} ingested, {failed} failed")

            # Log any individual document errors
            if result.errors:
                for error in result.errors:
                    print(f"  Error: {error.document_id} - {error.message}")

        except Exception as e:
            print(f"Batch request failed: {e}")
            # Safe to retry this batch because document_id ensures idempotency
            failed += len(batch)

        # Rate limit: wait between batches to avoid overwhelming the API
        await asyncio.sleep(1)

    print(f"\nBootstrap complete: {processed} ingested, {failed} failed")
    return ingestion_ids


async def monitor_progress(ingestion_ids):
    """Monitor ingestion progress until all batches complete."""
    for ingestion_id in ingestion_ids:
        while True:
            status = await sdk.memories.status(ingestion_id=ingestion_id)
            if status.state in ("completed", "failed"):
                print(f"{ingestion_id}: {status.state} "
                      f"({status.processed_count} processed, {status.error_count} errors)")
                break
            await asyncio.sleep(5)

Best practices

Set fail_fast=False (the default) so that one malformed document does not prevent the rest of the batch from being ingested. After each batch, inspect the errors array in the response to identify and address individual failures.
While the batch API is designed for throughput, adding a short delay between requests (1-2 seconds) prevents queue backpressure. This is especially important during the initial bulk load when you may be sending hundreds of batch requests.
When loading data for multiple customers, ensure that each document includes the correct customer_id. Incorrect scoping during bootstrap is difficult to fix later — you would need to re-ingest the affected documents with the correct scope.
Clean your historical data before ingestion. Remove empty conversations, strip personally identifiable information that should not be stored, and ensure that timestamps are in ISO 8601 format. Prevention is far easier than remediation after ingestion.
Derive document_id from your source system’s primary key (e.g., migration_{source_id}). This ensures idempotency during retries and makes it easy to trace ingested memories back to their source records.
Before loading thousands of documents, ingest a small batch (10-20 documents) and verify the results. Check that scoping, timestamps, and entity resolution are working as expected. Then proceed with the full load.

Next steps

Runtime Ingestion

Learn how to ingest data in real-time as your agent operates.

Memory API Reference

Full API reference for the batch and single-document ingestion endpoints.

SDK Ingestion Guide

Detailed SDK guide for ingestion methods and options.

Accurate Mode

Understand long-range mode and why it is the default for bootstrap ingestion.