Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.maximem.ai/llms.txt

Use this file to discover all available pages before exploring further.

This page lists the operational characteristics of Synap Cloud so you can plan capacity, set SLOs, and design fallback behavior. Numbers below are targets, not contractual SLAs — for contractual commitments contact [email protected].

Latency targets (P95)

OperationModeTypical P95Alert threshold
memories.create() (single)fast< 100 ms> 300 ms
memories.create() (single)long-range< 200 ms (returns immediately, processing async)> 500 ms
memories.batch_create()< 500 ms (queues batch)> 2000 ms
conversation.context.fetch()fast< 150 ms> 500 ms
conversation.context.fetch()accurate< 600 ms> 1500 ms
conversation.context.compact()2–15 s depending on conversation length> 30 s
conversation.context.get_context_for_prompt() (cached)< 20 ms (local SDK cache hit)> 100 ms
conversation.context.get_compacted() (cached)< 20 ms> 100 ms
These are end-to-end measurements from the SDK call site, including network. Latency is measured at the SDK transport layer and emitted as telemetry — you can verify against your own measurements via the Dashboard’s Latency view. The numbers assume the SDK is co-located with the target Synap region. Cross-region calls (e.g., SDK in us-west-2 calling Synap in us-east-1) add ~70 ms of round-trip network.

Rate limits

Per Instance, per minute, defaults:
Endpoint groupLimitHeaders returned on 429
Ingestion (memories.*)600 req/minRetry-After, X-RateLimit-Reset
Retrieval (*.context.fetch, get_context_for_prompt, get_compacted)1,200 req/minRetry-After, X-RateLimit-Reset
Compaction (compact)60 req/min per conversationRetry-After
Conversation messages (record_message, record_messages_batch)1,800 req/minRetry-After
The SDK retries RateLimitError automatically with exponential backoff and honors the server’s Retry-After. If you consistently hit rate limits, contact [email protected] to request a higher limit — limits are tuneable per Instance.

Payload limits

FieldLimit
document body (single ingest)256 KB
Batch ingest payload4 MB total, max 100 documents per request
metadata dict8 KB serialized
search_query listMax 16 queries, each ≤ 512 chars
record_message content64 KB per message
Total messages per conversation (before compaction)No hard cap, but plan to call compact() past 10K tokens
Documents exceeding the single-ingest limit will return 400 InvalidInputError. Use the file upload endpoint (memories.create_from_file) or chunk client-side for large source documents.

Concurrency

ResourceLimit
Concurrent compact() jobs per Instance8 (subsequent requests queue)
Concurrent SDK gRPC anticipation streams per API key1
HTTP connections per SDK instance32 (configurable via SDKConfig.timeouts)
The SDK pools HTTP connections automatically. You generally don’t need to tune this unless you’re running highly parallel ingestion.

Capacity planning

Rule-of-thumb numbers for sizing:
  • One agent turn (retrieve + ingest) ≈ 2 Synap API calls, ~200 ms added latency, ~3 KB of payload.
  • One conversation lifecycle (10 turns + 1 compaction) ≈ 21 API calls.
  • Storage ≈ ~2 KB / memory (vector + graph + metadata). 10K memories per user ≈ 20 MB.
  • Throughput per Instance: comfortable up to ~10 turns/sec sustained. Above that, talk to us.

Status and monitoring

What’s NOT yet rate-limited or capped

Stated explicitly so you know what to monitor yourself until we add server-side enforcement:
  • Number of distinct conversations per Instance.
  • Number of distinct users / customers per Instance.
  • Memory retention age — no automatic expiry; configure your retention policy in MACA.
  • gRPC stream payload size — practical limits are network-bound, not enforced.
If you anticipate hitting any of these at scale, surface it during sales / Solutions Engineering so we can pre-provision.