This checklist covers every aspect of a production-ready Synap integration — from security and SDK configuration to monitoring and operational procedures. Work through each section before your first production deployment, and revisit it before subsequent releases.
Run through this checklist before every production deployment, not just the first one. Configuration changes, SDK upgrades, and new features each warrant a fresh review.
Credential management is the foundation of a secure Synap integration. A compromised bootstrap key or API key gives an attacker full access to your instance’s memory store.
1
Bootstrap key stored in a secrets manager
Never hardcode bootstrap keys in source code, environment files committed to version control, or Docker images. Use a proper secrets manager:
Webhook signature verification is implemented and tested
4
API key rotation schedule established
API keys should be rotated periodically. Synap supports graceful rotation with a 48-hour overlap window where both old and new keys are valid.
Recommended rotation cadence: every 90 days for standard deployments, every 30 days for high-security environments
Document the rotation procedure in your team’s runbook
Automate rotation if possible (e.g., via a cron job or CI/CD step)
API key rotation schedule is established and documented
5
Bootstrap key revoked after initial setup
The bootstrap key is consumed on first use, but revoking it explicitly in the Dashboard confirms it cannot be reused (even if the consumption state is somehow lost).
Bootstrap key has been explicitly revoked in the Dashboard after successful initialization
Proper SDK configuration ensures your integration performs well under production load and does not generate excessive logging or resource usage.
1
Log level set appropriately
In production, set log_level to "WARNING" or "ERROR". The "DEBUG" and "INFO" levels generate high-volume output that degrades performance and can expose sensitive information in log aggregators.
config = SDKConfig( log_level="WARNING" # Not "DEBUG" or "INFO" in production)
log_level is set to "WARNING" or "ERROR" (not "DEBUG" or "INFO")
2
Timeouts configured for your SLA
Default timeouts are suitable for most applications, but review them against your latency requirements:
Timeout
Default
Guidance
connect
5s
Increase to 10s if your infrastructure has high network latency
read
30s
Decrease for latency-sensitive paths; increase for large batch operations
write
10s
Usually sufficient; increase for large document ingestion
stream_idle
60s
gRPC streaming idle timeout; increase for low-traffic streams
Timeouts are reviewed and aligned with your application’s SLA requirements
3
Retry policy tuned
The default retry policy (3 attempts, exponential backoff with jitter) is appropriate for most use cases. Adjust if needed:
High-throughput systems: Reduce max_attempts to 2 to avoid retry storms
Critical operations: Increase max_attempts to 5 for reliability
Low-latency paths: Reduce backoff_max to limit total retry time
config = SDKConfig( retry_policy=RetryPolicy( max_attempts=3, backoff_base=1.0, backoff_max=10.0, backoff_jitter=True # Always enable jitter in production ))
Retry policy is reviewed and tuned for your workload profile
4
Cache backend enabled
The SQLite cache backend significantly improves retrieval performance for repeated queries. Ensure it is enabled:
config = SDKConfig( cache_backend="sqlite" # Not None)
cache_backend is set to "sqlite" for production performance
5
Session timeout configured
The session_timeout_minutes setting controls how long an authenticated session lasts before requiring re-authentication. The default is appropriate for most cases, but adjust based on your security requirements:
Your MACA configuration directly impacts memory quality, retrieval accuracy, and storage costs. Review it thoroughly before production.
1
MACA config reviewed and approved
Do not deploy with the default configuration. Create a config tailored to your use case and have it reviewed by your team. Refer to the Configuring Memory Guide for detailed guidance.
MACA config has been customized for your use case (not using defaults)
2
Dry-run tested before applying
Always validate configuration changes with a dry run before applying:
Config changes have been validated with dry_run=True before applying
3
Rollback plan documented
Know which config version you will roll back to if issues arise. Document:
The current stable version number
The rollback command or procedure
Expected impact of rolling back
Who is authorized to execute the rollback
Rollback plan is documented and the team knows how to execute it
4
Retention policy set
Unbounded memory growth increases storage costs and can degrade retrieval quality over time. Set a reasonable max_memory_age_days based on your use case:
storage: retention: max_memory_age_days: 365 # Not 0 (unlimited) unless justified
Retention policy is set to prevent unbounded memory growth
5
Context budget aligned with LLM context window
Ensure retrieval.context_budget.max_tokens does not exceed the space available in your LLM’s context window after accounting for the system prompt, user message, and expected response length.
LLM Window
System Prompt
User + Response
Available for Memories
8K
~500 tokens
~3K tokens
~4K tokens
32K
~500 tokens
~8K tokens
~8-16K tokens
128K
~500 tokens
~16K tokens
~16-32K tokens
context_budget.max_tokens fits within your LLM’s available context space
Robust error handling ensures your application degrades gracefully when Synap encounters issues, rather than crashing or returning empty responses.
1
All SynapError subtypes caught appropriately
Handle transient and permanent errors differently:
from maximem_synap.errors import ( SynapError, NetworkTimeoutError, RateLimitError, ServiceUnavailableError, InvalidInputError, AuthenticationError,)try: context = await sdk.conversation.context.fetch( conversation_id=conv_id, search_query=[query], mode="fast" )except (NetworkTimeoutError, ServiceUnavailableError) as e: # Transient: retry or fall back to no-memory mode logger.warning( "Synap unavailable (transient), proceeding without memory: %s " "(correlation_id=%s)", e, e.correlation_id ) context = Noneexcept RateLimitError as e: # Transient: respect retry_after logger.warning( "Rate limited, retry after %s seconds (correlation_id=%s)", e.retry_after_seconds, e.correlation_id ) context = Noneexcept InvalidInputError as e: # Permanent: fix the request logger.error("Invalid request to Synap: %s", e) raiseexcept AuthenticationError as e: # Permanent: credentials issue logger.critical("Synap auth failed: %s", e) raise
Error handling distinguishes between transient and permanent errors
2
Transient errors logged with correlation_id
Every SynapError includes a correlation_id field. Always log it — this is the primary identifier Synap support uses to trace issues.
except SynapError as e: logger.error( "Synap error: %s (correlation_id=%s)", e, e.correlation_id )
All error logs include the correlation_id from the Synap error
3
Graceful degradation implemented
Your application should continue functioning when Synap is unavailable — just without memory context. This is the single most important resilience pattern.
async def get_memory_context(sdk, conversation_id, query): """Retrieve memory context, returning None if unavailable.""" try: return await sdk.conversation.context.fetch( conversation_id=conversation_id, search_query=[query], max_results=5, mode="fast" ) except SynapError as e: logger.warning( "Memory retrieval failed, proceeding without context: %s", e ) return None# In your chat handler:context = await get_memory_context(sdk, conv_id, user_message)if context and context.facts: # Build enriched prompt with memories system_prompt = build_prompt_with_memories(context)else: # Fall back to generic prompt -- your app still works system_prompt = build_generic_prompt()
Application continues working (without memory) when Synap is unavailable
4
Rate limit handling with retry_after
When you receive a RateLimitError, respect the retry_after_seconds field before retrying:
except RateLimitError as e: await asyncio.sleep(e.retry_after_seconds) # Retry the operation
Rate limit errors are handled with proper backoff using retry_after_seconds
Optimization ensures your integration meets latency requirements and minimizes unnecessary resource usage.
1
Using fast mode for latency-sensitive paths
Use mode="fast" for any operation in the critical path of user-facing requests. Reserve mode="accurate" for background tasks, research queries, or paths where the user is willing to wait.
# Real-time chat: use fast modecontext = await sdk.conversation.context.fetch( conversation_id=conv_id, search_query=[query], mode="fast" # ~50-100ms)# Background analysis: use accurate modecontext = await sdk.conversation.context.fetch( conversation_id=conv_id, search_query=[query], mode="accurate" # ~200-500ms, better precision)
Fast mode is used for all latency-sensitive code paths
2
Batch ingestion for bulk operations
When ingesting multiple documents, use batch_create() instead of multiple create() calls:
# Good: single batch callawait sdk.memories.batch_create( documents=[ {"document": doc1, "document_type": "document", "user_id": "user_1"}, {"document": doc2, "document_type": "email", "user_id": "user_1"}, {"document": doc3, "document_type": "pdf", "user_id": "user_2"}, ], fail_fast=False # Continue processing even if one document fails)# Avoid: N sequential callsfor doc in documents: await sdk.memories.create(document=doc, ...) # Slower, more API calls
Batch ingestion is used for all bulk operations
3
Context compaction enabled for long conversations
For conversations that span many turns, use context compaction to keep the context within your LLM’s token budget:
result = await sdk.conversation.context.compact( conversation_id=conv_id, strategy="adaptive", # Automatically adjusts compression level target_tokens=2000)compacted = await sdk.conversation.context.get_compacted( conversation_id=conv_id, format="injection-ready" # Ready to insert into your LLM prompt)
Strategy
Compression
Best For
conservative
~70% retention
Important conversations, legal/compliance
balanced
~40% retention
General use
aggressive
~15% retention
Very long conversations, cost optimization
adaptive
Variable
Recommended default — adjusts based on content
Context compaction is configured for conversations that may exceed token budgets
4
Cache is enabled
Verify the cache backend is active and functioning:
A healthy cache should show a hit rate above 20% for typical applications. If the hit rate is near 0%, your query patterns may be too diverse for caching to help.
Cache backend is enabled and showing a healthy hit rate