Graceful Degradation

Synap should make your agent better when it’s available, and not break your agent when it isn’t. Treat retrieval and ingestion as best-effort in the hot path — never let them stop the LLM from generating a response.

The shape of “good enough” degradation

import asyncio
import logging
from maximem_synap import MaximemSynapSDK, SynapError, SynapTransientError

sdk = MaximemSynapSDK()
log = logging.getLogger(__name__)

async def safe_fetch_context(conversation_id: str, query: str):
    """Always returns something — even if it's empty."""
    try:
        return await asyncio.wait_for(
            sdk.conversation.context.fetch(
                conversation_id=conversation_id,
                search_query=[query],
                mode="fast",
                max_results=8,
            ),
            timeout=0.4,   # hard latency cap, lower than your conversational budget
        )
    except asyncio.TimeoutError:
        log.warning("synap_context_timeout conv=%s", conversation_id)
        return None
    except SynapTransientError as e:
        log.warning("synap_transient err=%s correlation_id=%s", e, e.correlation_id)
        return None
    except SynapError as e:
        log.error("synap_unexpected err=%s correlation_id=%s", e, e.correlation_id)
        return None


async def handle_turn(user_id: str, customer_id: str, conversation_id: str, msg: str) -> str:
    ctx = await safe_fetch_context(conversation_id, msg)

    if ctx is None:
        # Degraded mode — call the LLM without memory rather than 500ing
        memory_block = ""
        log.info("turn_degraded user=%s", user_id)
    else:
        memory_block = "\n".join(f"- {f.content}" for f in ctx.facts[:5])

    reply = await call_llm(memory_block, msg)

    # Ingest in the background. If it fails, queue for retry — don't await.
    asyncio.create_task(safe_ingest(user_id, customer_id, conversation_id, msg, reply))
    return reply


async def safe_ingest(user_id, customer_id, conversation_id, msg, reply, _retries=0):
    try:
        await sdk.memories.create(
            document=f"User: {msg}\nAssistant: {reply}",
            document_type="ai-chat-conversation",
            user_id=user_id,
            customer_id=customer_id,
            metadata={"conversation_id": conversation_id},
        )
    except SynapTransientError as e:
        if _retries < 3:
            await asyncio.sleep(2 ** _retries)
            return await safe_ingest(user_id, customer_id, conversation_id, msg, reply, _retries + 1)
        log.error("synap_ingest_dropped after retries user=%s msg_excerpt=%r correlation_id=%s",
                  user_id, msg[:80], e.correlation_id)
        # Optionally: enqueue for an out-of-band replayer
        await enqueue_for_replay(user_id, customer_id, conversation_id, msg, reply)
    except SynapError as e:
        log.error("synap_ingest_permanent err=%s correlation_id=%s", e, e.correlation_id)

What to watch in production

Three metrics that should be on your dashboard from day one:

Metric	What it tells you	Page if
`synap_context_timeout_rate`	Are users seeing degraded responses?	`> 1%` over 5 min
`synap_ingest_dropped_rate`	Are you losing memory?	`> 0.1%` over 1 hour
`synap_correlation_ids_in_errors`	Sample of `correlation_id` values for support	Always log; sample 5% to your error tracker

Every SynapError exposes e.correlation_id — log it. When you need to ask support, they need that ID.

Don’t do these things

Don’t fail the request on a Synap timeout. The LLM can answer without memory. The user gets a slightly worse response. Failing the request gets you an outage.
Don’t retry permanent errors. InvalidInputError, ContextNotFoundError, AuthenticationError won’t get better with retries. Fix the input or the credentials.
Don’t block on ingestion in the hot path. Always background it. The user shouldn’t wait for memory persistence to see the next assistant message.
Don’t catch and swallow without logging. Every catch should at minimum log the correlation_id. Silent swallows make production debugging hopeless.

Where the SDK already retries for you

SynapTransientError subclasses — NetworkTimeoutError, RateLimitError, ServiceUnavailableError, AgentUnavailableError — are retried automatically inside the SDK using the configured RetryPolicy. By the time one of these reaches your except block, the SDK already tried 2–3 times. So your wrapper retries are belt-and-suspenders for genuinely down-for-a-while scenarios.

Getting Started

Setup & Integration

SDK

Guides

Cookbook

Concepts

Dashboard

API Reference

Migration

Roadmap

Resources

The shape of “good enough” degradation

What to watch in production

Don’t do these things

Where the SDK already retries for you

Getting Started

Setup & Integration

SDK

Guides

Cookbook

Concepts

Dashboard

API Reference

Migration

Roadmap

Resources

Documentation Index

​The shape of “good enough” degradation

​What to watch in production

​Don’t do these things

​Where the SDK already retries for you

The shape of “good enough” degradation

What to watch in production

Don’t do these things

Where the SDK already retries for you