mode= controls a speed-vs-thoroughness tradeoff. It appears on two different axes — label which one you mean:
| Axis | Parameter | Values | Picks between |
|---|---|---|---|
| Ingestion | memories.create(mode=...) | fast · long-range | How deeply a write is processed and indexed |
| Retrieval | ...context.fetch(mode=...) | fast · accurate | How much work a read does to assemble context |
“Fast” means the same thing on both axes (lightweight, low-latency). The thorough setting is called
long-range for ingestion and accurate for retrieval — same principle, deeper processing for higher quality. The two are independent: you can ingest long-range and read back fast, or any combination.Quick comparison
| Aspect | Fast | Accurate |
|---|---|---|
| Retrieval latency | Lower | Higher |
| Ingestion processing | Faster | Slower (deeper) |
| Search method | Vector + graph (no LLM query decomposition) | Vector + graph + LLM subquery decomposition + reranking |
| Ranking signals | Cosine similarity | Similarity + recency + graph centrality + confidence |
| Entity resolution | Lightweight (basic NER) | Full pipeline (semantic matching, cross-reference) |
| Relationship awareness | Graph relationships, no LLM-driven multi-hop decomposition | Explicit graph edges with LLM-driven multi-hop decomposition |
| Compute cost | Lower | Higher |
| Best for | Real-time chat, simple queries, high throughput | Complex queries, summaries, relationship-aware context |
The two modes are not mutually exclusive. Use
fast for the hot path of a live conversation and switch to accurate for specific high-value queries — all within the same application and the same Synap Instance.Fast mode
The recommended default for real-time, conversational agents where low latency matters more than exhaustive extraction.Fast ingestion
Fast ingestion runs a lightweight extraction pipeline, optimized to make memories available quickly.| Stage | Behavior |
|---|---|
| Chunking | Basic semantic chunking by paragraph and sentence boundaries |
| Entity extraction | Lightweight named entity recognition (people, organizations, products) |
| Embedding | Vector embeddings generated for each chunk |
| Preference detection | Basic keyword-based preference identification |
| Storage | Chunks stored in the vector store; entities indexed for lookup |
Fast retrieval
Fast retrieval queries both the vector store and the knowledge graph, but skips the LLM-driven subquery decomposition and reranking that accurate mode adds. That keeps latency low for the hot path of real-time conversations.Query embedding
The query is converted into a vector embedding, consistent with the embeddings created during ingestion.
Vector similarity search
The embedding is compared against stored memory embeddings using cosine similarity, scoped to the applicable scope levels (user, customer, client, world) based on the provided
user_id and customer_id.Ranking
Results are ranked by cosine similarity. No LLM-driven decomposition or reranking pass is applied.
conversation_id must be a valid UUID. Generate one with str(uuid.uuid4()) and reuse it for every turn in the same conversation.Accurate mode
Prioritizes thoroughness and quality over speed. It runs the full extraction pipeline on ingestion (long-range) and adds LLM-driven refinement on retrieval (accurate), producing richer, more connected context.
Long-range ingestion
Long-range ingestion runs the complete extraction pipeline, producing structured, relationship-aware memories that power graph-based retrieval.Semantic chunking
Content is split into semantically coherent chunks, respecting topic boundaries and conversational turns.
Deep entity extraction
Captures all people, organizations, products, locations, concepts, and events — including implied entities and role-based references (“my manager”, “the person who handles billing”).
Entity resolution
Each entity is matched against the full registry using exact, alias, semantic, and contextual strategies; new entities are auto-registered. See Entity Resolution.
Relationship mapping
Explicit and implicit relationships become graph edges — e.g. “Sarah is leading Project Atlas” → Sarah —[leads]—> Project Atlas.
Preference detection
Stated, implied, and contextual preferences are extracted with high confidence.
Emotional and sentiment analysis
Emotional tone is analyzed and stored as metadata that can influence retrieval ranking.
Accurate retrieval
Accurate retrieval queries both stores — the same dual-store retrieval fast mode uses — and adds two distinguishing steps: LLM-driven subquery decomposition and reranking, together with multi-signal ranking.Query embedding & vector search
Same as fast mode — embed the query, find candidates by cosine similarity.
LLM subquery decomposition and graph traversal
The query is decomposed into focused sub-queries that expand the entities and angles explored. Entities from the query and top vector results seed graph traversal, following relationship edges to connected entities, related facts, and context. Querying “Project Atlas” reaches Sarah Chen, James, the database migration, the Q3 deadline, and Maria’s budget approval.
Cross-engine merging
Vector and graph results are merged into one candidate set; duplicates removed, scores normalized to a common scale.
Multi-signal ranking
Candidates are ranked on semantic similarity, recency, graph centrality, and extraction confidence — weighted into a final relevance score.
What graph traversal adds
The same query, fast vs accurate:- Fast mode results
- Accurate mode results
Returns memories that directly mention “Project Atlas”:Useful, but limited to direct mentions.
Choosing a mode
Use fast for…
Use fast for…
- Real-time conversations where the user is waiting — fast retrieval is rarely the bottleneck; LLM generation dominates response time.
- Single-topic queries answerable from one memory chunk (“What is our refund policy?”, “When is Alice’s birthday?”).
- High-frequency retrieval on every message, at scale — the lower compute cost matters.
- Latency-sensitive apps: voice agents, real-time collaboration.
Upgrade to accurate for…
Upgrade to accurate for…
- Complex, multi-entity queries: “Summarize everything about Project Atlas, who’s involved, and what’s been decided.”
- Relationship queries: “How is Sarah connected to the infrastructure migration?”
- Comprehensive summaries / briefings that must not miss context (latency is acceptable since they aren’t time-sensitive).
- Onboarding / profile-building, where deep extraction builds a richer profile that even later fast-mode reads benefit from.
- High-value interactions: escalations, renewals, executive conversations.
Mixing modes in practice
Most production apps combine both — fast by default, accurate for the queries and writes that justify it:mode="fast" on memories.batch_create(...) items when speed beats extraction depth for high-volume, lower-priority data.
Next steps
Context Fetch SDK
Full SDK reference for retrieval methods and mode selection.
Runtime Ingestion
How runtime ingestion integrates fast mode into the agent loop.
Memory Architecture
Configure ingestion and retrieval defaults in your memory architecture.
Entity Resolution
How long-range’s deep entity resolution builds the knowledge graph.