Bootstrap ingestion is the process of loading pre-existing data into Synap in bulk. Before your agent starts handling live conversations, you often need to seed it with historical context: past conversations, product documentation, knowledge base articles, customer records, and other reference material. Bootstrap ingestion is designed specifically for this use case, providing high-throughput batch processing without interfering with real-time operations.
Bootstrap ingestion uses the BOOTSTRAP priority in the ingestion queue, which ensures that bulk loads never block real-time ingestion. Your live agent continues to operate normally while historical data is processed in the background.
Bootstrap ingestion is the right choice whenever you need to load a significant volume of existing data into Synap:
Migrating from another system: Moving from a custom memory solution, a competing product, or an in-house knowledge base to Synap.
Loading historical conversations: Importing past chat logs, support tickets, or email threads so your agent has context about previous interactions.
Seeding product documentation: Ingesting your product docs, FAQs, help center articles, and internal wikis to give your agent comprehensive product knowledge.
Backfilling customer data: Loading CRM records, customer profiles, and organizational context for existing customers.
Populating shared knowledge: Ingesting company policies, SOPs, and reference material at customer or client scope.
You do not need to finish bootstrap ingestion before your agent goes live. Bootstrap and runtime ingestion can run simultaneously. The BOOTSTRAP priority queue ensures they do not compete for resources.
Always set document_created_at to the original creation timestamp of the document. Without this, Synap defaults to the ingestion time, which distorts temporal ordering. If a user asks “What did we discuss last March?”, accurate timestamps are essential for correct retrieval.
{ "document": "Conversation from last year...", "document_created_at": "2024-03-15T10:30:00Z" # Original timestamp}
2
Use document IDs for idempotency
Assign a unique document_id to every document. If a batch request is interrupted or times out, you can safely retry the entire batch. Documents with IDs that have already been ingested will be skipped, preventing duplicates.
{ "document": "...", "document_id": "conv_2024_001" # Will not create a duplicate on retry}
3
Use long-range mode for historical data
Bootstrap data typically benefits from thorough extraction. Use long-range mode (the default for batch) to perform deep entity resolution, relationship mapping, and preference detection. The extra processing time is acceptable for batch loads since they run in the background.
4
Monitor ingestion progress
For large batch loads, monitor progress using the status endpoint:
The batch API is optimized for high-throughput ingestion:
Aspect
Detail
Max documents per request
100
Max document size
100 KB per document
Queue priority
BOOTSTRAP — processes below real-time but above maintenance tasks
Concurrency
Multiple batch requests can run in parallel
Processing order
Documents within a batch are processed in submission order
Idempotency
Safe to retry — duplicates are detected by document_id
Avoid sending more than 10 concurrent batch requests. While the API accepts them, excessive concurrency can lead to queue backpressure and increased processing latency for all ingestion types.
Set fail_fast=False (the default) so that one malformed document does not prevent the rest of the batch from being ingested. After each batch, inspect the errors array in the response to identify and address individual failures.
Rate limit your batch requests
While the batch API is designed for throughput, adding a short delay between requests (1-2 seconds) prevents queue backpressure. This is especially important during the initial bulk load when you may be sending hundreds of batch requests.
Organize by customer for scope correctness
When loading data for multiple customers, ensure that each document includes the correct customer_id. Incorrect scoping during bootstrap is difficult to fix later — you would need to re-ingest the affected documents with the correct scope.
Validate data before ingestion
Clean your historical data before ingestion. Remove empty conversations, strip personally identifiable information that should not be stored, and ensure that timestamps are in ISO 8601 format. Prevention is far easier than remediation after ingestion.
Use document_id consistently
Derive document_id from your source system’s primary key (e.g., migration_{source_id}). This ensures idempotency during retries and makes it easy to trace ingested memories back to their source records.
Start with a small test batch
Before loading thousands of documents, ingest a small batch (10-20 documents) and verify the results. Check that scoping, timestamps, and entity resolution are working as expected. Then proceed with the full load.