Performance & Limits

This page lists the operational characteristics of Synap Cloud so you can plan capacity, set SLOs, and design fallback behavior. Numbers below are targets, not contractual SLAs — for contractual commitments contact [email protected].

Latency targets (P95)

Operation	Mode	Typical P95	Alert threshold
`memories.create()` (single)	`fast`	< 100 ms	> 300 ms
`memories.create()` (single)	`long-range`	< 200 ms (returns immediately, processing async)	> 500 ms
`memories.batch_create()`	—	< 500 ms (queues batch)	> 2000 ms
`conversation.context.fetch()`	`fast`	< 150 ms	> 500 ms
`conversation.context.fetch()`	`accurate`	< 600 ms	> 1500 ms
`conversation.context.compact()`	—	2–15 s depending on conversation length	> 30 s
`conversation.context.get_context_for_prompt()` (cached)	—	< 20 ms (local SDK cache hit)	> 100 ms
`conversation.context.get_compacted()` (cached)	—	< 20 ms	> 100 ms

These are end-to-end measurements from the SDK call site, including network. Latency is measured at the SDK transport layer and emitted as telemetry — you can verify against your own measurements via the Dashboard’s Latency view. The numbers assume the SDK is co-located with the target Synap region. Cross-region calls (e.g., SDK in us-west-2 calling Synap in us-east-1) add ~70 ms of round-trip network.

Rate limits

Per Instance, per minute, defaults:

Endpoint group	Limit	Headers returned on `429`
Ingestion (`memories.*`)	600 req/min	`Retry-After`, `X-RateLimit-Reset`
Retrieval (`*.context.fetch`, `get_context_for_prompt`, `get_compacted`)	1,200 req/min	`Retry-After`, `X-RateLimit-Reset`
Compaction (`compact`)	60 req/min per conversation	`Retry-After`
Conversation messages (`record_message`, `record_messages_batch`)	1,800 req/min	`Retry-After`

The SDK retries RateLimitError automatically with exponential backoff and honors the server’s Retry-After. If you consistently hit rate limits, contact [email protected] to request a higher limit — limits are tuneable per Instance.

Payload limits

Field	Limit
`document` body (single ingest)	256 KB
Batch ingest payload	4 MB total, max 100 documents per request
`metadata` dict	8 KB serialized
`search_query` list	Max 16 queries, each ≤ 512 chars
`record_message` content	64 KB per message
Total messages per conversation (before compaction)	No hard cap, but plan to call `compact()` past 10K tokens

Documents exceeding the single-ingest limit will return 400 InvalidInputError. Use the file upload endpoint (memories.create_from_file) or chunk client-side for large source documents.

Concurrency

Resource	Limit
Concurrent `compact()` jobs per Instance	8 (subsequent requests queue)
Concurrent SDK gRPC anticipation streams per API key	1
HTTP connections per SDK instance	32 (configurable via `SDKConfig.timeouts`)

The SDK pools HTTP connections automatically. You generally don’t need to tune this unless you’re running highly parallel ingestion.

Capacity planning

Rule-of-thumb numbers for sizing:

One agent turn (retrieve + ingest) ≈ 2 Synap API calls, ~200 ms added latency, ~3 KB of payload.
One conversation lifecycle (10 turns + 1 compaction) ≈ 21 API calls.
Storage ≈ ~2 KB / memory (vector + graph + metadata). 10K memories per user ≈ 20 MB.
Throughput per Instance: comfortable up to ~10 turns/sec sustained. Above that, talk to us.

Status and monitoring

Status page: synap.maximem.ai/status
Per-Instance metrics: Dashboard → Instance → Performance
Webhook for incidents: planned (see Roadmap)

What’s NOT yet rate-limited or capped

Stated explicitly so you know what to monitor yourself until we add server-side enforcement:

Number of distinct conversations per Instance.
Number of distinct users / customers per Instance.
Memory retention age — no automatic expiry; configure your retention policy in MACA.
gRPC stream payload size — practical limits are network-bound, not enforced.

If you anticipate hitting any of these at scale, surface it during sales / Solutions Engineering so we can pre-provision.

Getting Started

Setup & Integration

SDK

Guides

Cookbook

Concepts

Dashboard

API Reference

Migration

Roadmap

Resources

Latency targets (P95)

Rate limits

Payload limits

Concurrency

Capacity planning

Status and monitoring

What’s NOT yet rate-limited or capped

Getting Started

Setup & Integration

SDK

Guides

Cookbook

Concepts

Dashboard

API Reference

Migration

Roadmap

Resources

Documentation Index

​Latency targets (P95)

​Rate limits

​Payload limits

​Concurrency

​Capacity planning

​Status and monitoring

​What’s NOT yet rate-limited or capped

Latency targets (P95)

Rate limits

Payload limits

Concurrency

Capacity planning

Status and monitoring

What’s NOT yet rate-limited or capped