Documentation Index
Fetch the complete documentation index at: https://docs.maximem.ai/llms.txt
Use this file to discover all available pages before exploring further.
This page lists the operational characteristics of Synap Cloud so you can plan capacity, set SLOs, and design fallback behavior. Numbers below are targets, not contractual SLAs — for contractual commitments contact [email protected].
Latency targets (P95)
| Operation | Mode | Typical P95 | Alert threshold |
|---|
memories.create() (single) | fast | < 100 ms | > 300 ms |
memories.create() (single) | long-range | < 200 ms (returns immediately, processing async) | > 500 ms |
memories.batch_create() | — | < 500 ms (queues batch) | > 2000 ms |
conversation.context.fetch() | fast | < 150 ms | > 500 ms |
conversation.context.fetch() | accurate | < 600 ms | > 1500 ms |
conversation.context.compact() | — | 2–15 s depending on conversation length | > 30 s |
conversation.context.get_context_for_prompt() (cached) | — | < 20 ms (local SDK cache hit) | > 100 ms |
conversation.context.get_compacted() (cached) | — | < 20 ms | > 100 ms |
These are end-to-end measurements from the SDK call site, including network. Latency is measured at the SDK transport layer and emitted as telemetry — you can verify against your own measurements via the Dashboard’s Latency view.
The numbers assume the SDK is co-located with the target Synap region. Cross-region calls (e.g., SDK in us-west-2 calling Synap in us-east-1) add ~70 ms of round-trip network.
Rate limits
Per Instance, per minute, defaults:
| Endpoint group | Limit | Headers returned on 429 |
|---|
Ingestion (memories.*) | 600 req/min | Retry-After, X-RateLimit-Reset |
Retrieval (*.context.fetch, get_context_for_prompt, get_compacted) | 1,200 req/min | Retry-After, X-RateLimit-Reset |
Compaction (compact) | 60 req/min per conversation | Retry-After |
Conversation messages (record_message, record_messages_batch) | 1,800 req/min | Retry-After |
The SDK retries RateLimitError automatically with exponential backoff and honors the server’s Retry-After. If you consistently hit rate limits, contact [email protected] to request a higher limit — limits are tuneable per Instance.
Payload limits
| Field | Limit |
|---|
document body (single ingest) | 256 KB |
| Batch ingest payload | 4 MB total, max 100 documents per request |
metadata dict | 8 KB serialized |
search_query list | Max 16 queries, each ≤ 512 chars |
record_message content | 64 KB per message |
| Total messages per conversation (before compaction) | No hard cap, but plan to call compact() past 10K tokens |
Documents exceeding the single-ingest limit will return 400 InvalidInputError. Use the file upload endpoint (memories.create_from_file) or chunk client-side for large source documents.
Concurrency
| Resource | Limit |
|---|
Concurrent compact() jobs per Instance | 8 (subsequent requests queue) |
| Concurrent SDK gRPC anticipation streams per API key | 1 |
| HTTP connections per SDK instance | 32 (configurable via SDKConfig.timeouts) |
The SDK pools HTTP connections automatically. You generally don’t need to tune this unless you’re running highly parallel ingestion.
Capacity planning
Rule-of-thumb numbers for sizing:
- One agent turn (retrieve + ingest) ≈ 2 Synap API calls, ~200 ms added latency, ~3 KB of payload.
- One conversation lifecycle (10 turns + 1 compaction) ≈ 21 API calls.
- Storage ≈ ~2 KB / memory (vector + graph + metadata). 10K memories per user ≈ 20 MB.
- Throughput per Instance: comfortable up to ~10 turns/sec sustained. Above that, talk to us.
Status and monitoring
What’s NOT yet rate-limited or capped
Stated explicitly so you know what to monitor yourself until we add server-side enforcement:
- Number of distinct conversations per Instance.
- Number of distinct users / customers per Instance.
- Memory retention age — no automatic expiry; configure your retention policy in MACA.
- gRPC stream payload size — practical limits are network-bound, not enforced.
If you anticipate hitting any of these at scale, surface it during sales / Solutions Engineering so we can pre-provision.