Production Checklist - Maximem Synap

Run through this checklist before every production deployment, not just the first one. Configuration changes, SDK upgrades, and new features each warrant a fresh review.

Authentication and Security

Credential management is the foundation of a secure Synap integration. A compromised API key gives an attacker full access to your instance’s memory store.

API key stored in a secrets manager

Never hardcode API keys in source code, environment files committed to version control, or Docker images. Use a proper secrets manager:

AWS: Secrets Manager or SSM Parameter Store
GCP: Secret Manager
Azure: Key Vault
Self-hosted: HashiCorp Vault

# Good: loaded from secrets manager at runtime
import boto3

def get_api_key():
    client = boto3.client("secretsmanager")
    response = client.get_secret_value(SecretId="synap/api-key")
    return response["SecretString"]

sdk = MaximemSynapSDK(
    api_key=get_api_key()
)

API key is stored in a secrets manager (not in code, .env files, or container images)

Webhook signature verification implemented

If you receive webhooks from Synap, always verify the signature before processing the payload. Unverified webhooks can be spoofed by attackers.

import hmac
import hashlib

def verify_webhook_signature(payload: bytes, signature: str, secret: str) -> bool:
    expected = hmac.new(
        secret.encode(), payload, hashlib.sha256
    ).hexdigest()
    return hmac.compare_digest(f"sha256={expected}", signature)

Webhook signature verification is implemented and tested

API key rotation schedule established

API keys should be rotated periodically. You can have multiple active keys per instance, so rotation is zero-downtime: generate a new key, roll it out, then revoke the old one.

Recommended rotation cadence: every 90 days for standard deployments, every 30 days for high-security environments
Document the rotation procedure in your team’s runbook
Automate rotation if possible (e.g., via a cron job or CI/CD step)

API key rotation schedule is established and documented

SDK Configuration

Proper SDK configuration ensures your integration performs well under production load and does not generate excessive logging or resource usage.

Log level set appropriately

In production, set log_level to "WARNING" or "ERROR". The "DEBUG" and "INFO" levels generate high-volume output that degrades performance and can expose sensitive information in log aggregators.

config = SDKConfig(
    log_level="WARNING"    # Not "DEBUG" or "INFO" in production
)

log_level is set to "WARNING" or "ERROR" (not "DEBUG" or "INFO")

Timeouts configured for your SLA

Default timeouts are suitable for most applications, but review them against your latency requirements:

Timeout	Default	Guidance
`connect`	5s	Increase to 10s if your infrastructure has high network latency
`read`	30s	Decrease for latency-sensitive paths; increase for large batch operations
`write`	10s	Usually sufficient; increase for large document ingestion

config = SDKConfig(
    timeouts=TimeoutConfig(
        connect=5.0,
        read=30.0,
        write=10.0
    )
)

Timeouts are reviewed and aligned with your application’s SLA requirements

Retry policy tuned

The default retry policy (3 attempts, exponential backoff with jitter) is appropriate for most use cases. Adjust if needed:

High-throughput systems: Reduce max_attempts to 2 to avoid retry storms
Critical operations: Increase max_attempts to 5 for reliability
Low-latency paths: Reduce backoff_max to limit total retry time

config = SDKConfig(
    retry_policy=RetryPolicy(
        max_attempts=3,
        backoff_base=1.0,
        backoff_max=10.0,
        backoff_jitter=True     # Always enable jitter in production
    )
)

Retry policy is reviewed and tuned for your workload profile

Cache backend enabled

The SQLite cache backend significantly improves retrieval performance for repeated queries. Ensure it is enabled:

config = SDKConfig(
    cache_backend="sqlite"     # Not None
)

cache_backend is set to "sqlite" for production performance

Session timeout configured

The session_timeout_minutes setting controls how long an authenticated session lasts before requiring re-authentication. The default is appropriate for most cases, but adjust based on your security requirements:

Standard applications: 60-480 minutes (1-8 hours)
High-security environments: 5-30 minutes
Long-running batch processes: 720-1440 minutes (12-24 hours)

session_timeout_minutes is configured appropriately (range: 5-1440)

Memory Architecture

Synap generates each Instance’s memory configuration automatically from the use-case file you upload. Before going to production, make sure that file reflects the agent you are actually deploying.

Use-case file accurately describes the production agent

The use-case Markdown you uploaded at instance creation drives every memory decision: which categories are extracted, how scopes are partitioned, what retention behavior applies. Review it now and re-upload an updated version if the agent’s purpose, audience, or compliance requirements have shifted since you created the Instance.

Use-case file reflects the production agent’s behavior, audience, and compliance posture

Verify retrieval quality on representative queries

Before going live, run a handful of representative production queries against the Instance and confirm the returned memories are relevant and complete. Catch retrieval drift before users do.

Retrieval quality validated on at least 10 representative queries

Confirm the Instance is in the active state

Check the Dashboard to confirm your Instance has moved from provisioning to active and that its memory architecture has been generated and applied. Do not start production traffic on an Instance that is still provisioning.

Instance status is active and ready to accept traffic

Error Handling

Robust error handling ensures your application degrades gracefully when Synap encounters issues, rather than crashing or returning empty responses.

All SynapError subtypes caught appropriately

Handle transient and permanent errors differently:

from maximem_synap.errors import (
    SynapError,
    NetworkTimeoutError,
    RateLimitError,
    ServiceUnavailableError,
    InvalidInputError,
    AuthenticationError,
)

try:
    context = await sdk.conversation.context.fetch(
        conversation_id=conv_id,
        search_query=[query],
        mode="fast"
    )
except (NetworkTimeoutError, ServiceUnavailableError) as e:
    # Transient: retry or fall back to no-memory mode
    logger.warning(
        "Synap unavailable (transient), proceeding without memory: %s "
        "(correlation_id=%s)", e, e.correlation_id
    )
    context = None
except RateLimitError as e:
    # Transient: respect retry_after
    logger.warning(
        "Rate limited, retry after %s seconds (correlation_id=%s)",
        e.retry_after_seconds, e.correlation_id
    )
    context = None
except InvalidInputError as e:
    # Permanent: fix the request
    logger.error("Invalid request to Synap: %s", e)
    raise
except AuthenticationError as e:
    # Permanent: credentials issue
    logger.critical("Synap auth failed: %s", e)
    raise

Error handling distinguishes between transient and permanent errors

Transient errors logged with correlation_id

Every SynapError includes a correlation_id field. Always log it: this is the primary identifier Synap support uses to trace issues.

except SynapError as e:
    logger.error(
        "Synap error: %s (correlation_id=%s)",
        e, e.correlation_id
    )

All error logs include the correlation_id from the Synap error

Graceful degradation implemented

Your application should continue functioning when Synap is unavailable, just without memory context. This is the single most important resilience pattern.

async def get_memory_context(sdk, conversation_id, query):
    """Retrieve memory context, returning None if unavailable."""
    try:
        return await sdk.conversation.context.fetch(
            conversation_id=conversation_id,
            search_query=[query],
            max_results=5,
            mode="fast"
        )
    except SynapError as e:
        logger.warning(
            "Memory retrieval failed, proceeding without context: %s",
            e
        )
        return None

# In your chat handler:
context = await get_memory_context(sdk, conv_id, user_message)

if context and context.facts:
    # Build enriched prompt with memories
    system_prompt = build_prompt_with_memories(context)
else:
    # Fall back to generic prompt, your app still works
    system_prompt = build_generic_prompt()

Application continues working (without memory) when Synap is unavailable

Rate limit handling with retry_after

When you receive a RateLimitError, respect the retry_after_seconds field before retrying:

except RateLimitError as e:
    await asyncio.sleep(e.retry_after_seconds)
    # Retry the operation

Rate limit errors are handled with proper backoff using retry_after_seconds

Monitoring

Observability is critical for understanding how your Synap integration performs in production and catching issues before they impact users.

Dashboard analytics reviewed regularly

The Synap Dashboard provides real-time analytics for each instance:

API call volume and success rate
Memory counts by category and scope
Ingestion throughput and processing latency
Retrieval latency percentiles (P50, P95, P99)

Establish a regular review cadence (at least weekly).

Dashboard analytics overview showing API volume, memory counts, and latency

Dashboard analytics are reviewed on a regular schedule

Webhooks configured for critical events

Set up webhooks to receive notifications for important events:

ingestion.failed: ingestion pipeline errors
credential.expiring: credentials approaching expiration
config.applied: configuration changes
retention.cleanup: memory retention cleanup completed

See Dashboard Webhooks for setup instructions.

Webhooks are configured for critical operational events

P95 latency baseline established

Synap does not publish per-operation latency SLOs; real numbers depend on your MACA configuration, payload sizes, retrieval mode, network path, and traffic shape. Measure your own staging baseline and set alert thresholds from that baseline.Recommended approach:

Run a representative mix of memories.create(), context.fetch() (both fast and accurate), and memories.batch_create() against your staging Instance.
Record P50 / P95 / P99 for each operation over a representative window (at least 1 hour of realistic traffic).
Set production alert thresholds at a multiple of your staging P95 (e.g., 2-3× P95) so normal variance doesn’t page you.
Re-baseline after any MACA change, SDK upgrade, or significant traffic-pattern shift.

P95 latency baselines are established from your own staging measurements and alert thresholds are derived from those baselines

Error rate alerts configured

Set up alerts in your monitoring system (Datadog, PagerDuty, CloudWatch, etc.) for:

Synap API error rate exceeding 1% over 5 minutes
Authentication failures (any occurrence)
Rate limit hits exceeding your expected threshold
Retrieval returning zero results when memories are expected

Error rate alerts are configured in your monitoring platform

Cost tracking enabled

If your Synap plan includes usage-based pricing, track your usage against budget:

API call volume (ingestion + retrieval)
Storage usage (vector + graph)
Bandwidth usage

The Dashboard provides usage breakdowns on the billing page.

Usage and cost tracking is enabled and reviewed regularly

Performance

Optimization ensures your integration meets latency requirements and minimizes unnecessary resource usage.

Using fast mode for latency-sensitive paths

Use mode="fast" for any operation in the critical path of user-facing requests. Reserve mode="accurate" for background tasks, research queries, or paths where the user is willing to wait.

# Real-time chat: use fast mode
# fast = single-pass vector + graph search
context = await sdk.conversation.context.fetch(
    conversation_id=conv_id,
    search_query=[query],
    mode="fast"
)

# Background analysis: use accurate mode
# accurate = same vector + graph search, plus LLM subquery decomposition + reranking
context = await sdk.conversation.context.fetch(
    conversation_id=conv_id,
    search_query=[query],
    mode="accurate"
)

Fast mode is used for all latency-sensitive code paths

Batch ingestion for bulk operations

When ingesting multiple documents, use batch_create() instead of multiple create() calls:

from maximem_synap import CreateMemoryRequest

# Good: single batch call
await sdk.memories.batch_create(
    documents=[
        CreateMemoryRequest(document=doc1, document_type="document", user_id="user_1"),
        CreateMemoryRequest(document=doc2, document_type="email", user_id="user_1"),
        CreateMemoryRequest(document=doc3, document_type="pdf", user_id="user_2"),
    ],
    fail_fast=False    # Continue processing even if one document fails
)

# Avoid: N sequential calls
for doc in documents:
    await sdk.memories.create(document=doc, ...)  # Slower, more API calls

Batch ingestion is used for all bulk operations

Context compaction enabled for long conversations

For conversations that span many turns, use context compaction to keep the context within your LLM’s token budget:

result = await sdk.conversation.context.compact(
    conversation_id=conv_id,
    strategy="adaptive",      # Automatically adjusts compression level
    target_tokens=2000
)

compacted = await sdk.conversation.context.get_compacted(
    conversation_id=conv_id,
    format="structured"       # Also valid: "narrative", "bullet_points"
)

Strategy	Compression	Best For
`conservative`	Highest retention (approximate; varies by content)	Important conversations, legal/compliance
`balanced`	Moderate retention (approximate; varies by content)	General use
`aggressive`	Lowest retention (approximate; varies by content)	Very long conversations, cost optimization
`adaptive`	Variable	Recommended default, adjusts based on content

Context compaction is configured for conversations that may exceed token budgets

Cache is enabled

Verify the cache backend is active and functioning:

stats = sdk.cache.stats()
print(f"Cache entries: {stats['total_entries']}")
print(f"Cache size: {stats['total_bytes']} bytes")
print(f"Backends: {stats['backends']}")

cache.stats() is synchronous and returns a dict with enabled, client_id, base_path, total_entries, total_bytes, and per-backend stats under backends. If enabled is False or total_entries stays at 0 over time, your cache backend isn’t engaged; check SDKConfig.cache_backend.

Cache backend is enabled and accumulating entries

Operational Readiness

Beyond code and configuration, production readiness requires documented procedures and team alignment.

Team roles assigned appropriately

In the Synap Dashboard, assign roles based on the principle of least privilege:

Role	Capabilities	Assign To
Owner	Full access, billing, delete instance	Engineering lead, CTO
Admin	Config changes, key management, analytics	Senior engineers, DevOps
Developer	Read analytics, view config (no changes)	All developers
Viewer	Read-only Dashboard access	Product managers, support

Team members have appropriate roles (not everyone is Owner)

Credential rotation runbook documented

Document the step-by-step procedure for rotating API keys:

Generate new key in Dashboard
Update secrets manager with new key
Deploy application with updated secret reference
Verify new key is working (check Dashboard for API calls)
Revoke old key after grace period (48 hours)

Credential rotation runbook is documented and accessible to the operations team

Support channel documented

Ensure your team knows how to get help:

Synap Documentation: docs.maximem.ai
Community Discord: discord.gg/synap
GitHub Issues: github.com/maximem-ai/maximem_synap_sdk/issues
Email Support: [email protected] (include correlation_id in all reports)

Support channels are documented and the team knows how to report issues

Quick Summary

Use this condensed checklist for quick pre-deployment reviews:

Area	Item	Status
Security	API key in secrets manager
Security	Webhook signatures verified
Security	API key rotation scheduled
SDK	Log level set to WARNING/ERROR
SDK	Timeouts match SLA
SDK	Cache enabled (sqlite)
Memory	Use-case file matches the production agent
Memory	Retrieval quality validated on representative queries
Memory	Instance status is `active`
Errors	Transient vs permanent handling
Errors	Graceful degradation
Errors	correlation_id in logs
Monitoring	Dashboard reviewed regularly
Monitoring	Webhooks for critical events
Monitoring	Latency alerts configured
Performance	Fast mode for user-facing paths
Performance	Batch ingestion for bulk ops
Operations	Roles assigned (least privilege)
Operations	Rotation + rollback runbooks

Next Steps

Migrate from Competitors

Mapping your existing memory system (Mem0, Zep, Letta, SuperMemory) to Synap.

Monitoring and Analytics

Deep dive into Dashboard analytics and monitoring capabilities.

Error Handling

Complete reference for all Synap error types and handling patterns.

Webhooks

Configure webhooks for real-time event notifications.

​Authentication and Security

​SDK Configuration

​Memory Architecture

​Error Handling

​Monitoring

​Performance

​Operational Readiness

​Quick Summary

​Next Steps

Migrate from Competitors

Monitoring and Analytics

Error Handling

Webhooks

Authentication and Security

SDK Configuration

Memory Architecture

Error Handling

Monitoring

Performance

Operational Readiness

Quick Summary

Next Steps