SDK timeouts
The SDK ships with two default timeouts, one per retrieval mode:| Mode | Default timeout |
|---|---|
fast | 8000 ms |
accurate | 30000 ms |
SDKConfig.timeouts.
Retrieval modes
sdk.conversation.context.fetch() accepts mode="fast" (default) or mode="accurate".
fast— lower-latency path. Queries the vector store and the graph store.accurate— higher-quality path. Queries the vector store and the graph store, and additionally runs LLM subquery decomposition and reranking on the candidate set.
accurate simply spends more compute to widen and re-order the result set.
Where to see live numbers
Per-Instance latency, request volume, and rate-limit headroom are visible in Dashboard → Usage. Use that view for capacity planning rather than any number quoted in docs — your workload is the source of truth. Document/message size and rate ceilings are likewise per-Instance rather than fixed platform constants, and are read off the same Dashboard → Usage view. If you need a ceiling raised, contact [email protected].Behavior under load
- The SDK auto-retries transient errors (rate limits, service-unavailable) using
RetryPolicywith exponential backoff. You generally don’t need to wrap calls in your own retry loop. - When retries are exhausted on a rate-limited call, the SDK raises
RateLimitError. Catch it to fall back gracefully or surface a user-visible error. - If you consistently hit
RateLimitErrorfor an Instance, your per-Instance ceiling is tuneable — contact [email protected].
Status and monitoring
- Status page: synap.maximem.ai/status
- Per-Instance metrics: Dashboard → Usage