Skip to main content
mode= controls a speed-vs-thoroughness tradeoff. It appears on two different axes — label which one you mean:
AxisParameterValuesPicks between
Ingestionmemories.create(mode=...)fast · long-rangeHow deeply a write is processed and indexed
Retrieval...context.fetch(mode=...)fast · accurateHow much work a read does to assemble context
“Fast” means the same thing on both axes (lightweight, low-latency). The thorough setting is called long-range for ingestion and accurate for retrieval — same principle, deeper processing for higher quality. The two are independent: you can ingest long-range and read back fast, or any combination.

Quick comparison

AspectFastAccurate
Retrieval latencyLowerHigher
Ingestion processingFasterSlower (deeper)
Search methodVector + graph (no LLM query decomposition)Vector + graph + LLM subquery decomposition + reranking
Ranking signalsCosine similaritySimilarity + recency + graph centrality + confidence
Entity resolutionLightweight (basic NER)Full pipeline (semantic matching, cross-reference)
Relationship awarenessGraph relationships, no LLM-driven multi-hop decompositionExplicit graph edges with LLM-driven multi-hop decomposition
Compute costLowerHigher
Best forReal-time chat, simple queries, high throughputComplex queries, summaries, relationship-aware context
The two modes are not mutually exclusive. Use fast for the hot path of a live conversation and switch to accurate for specific high-value queries — all within the same application and the same Synap Instance.
Building a real-time chatbot or voice agent? Start with fast for both ingestion and retrieval, then selectively upgrade specific interactions to long-range / accurate as needed — no architecture change required.

Fast mode

The recommended default for real-time, conversational agents where low latency matters more than exhaustive extraction.

Fast ingestion

Fast ingestion runs a lightweight extraction pipeline, optimized to make memories available quickly.
StageBehavior
ChunkingBasic semantic chunking by paragraph and sentence boundaries
Entity extractionLightweight named entity recognition (people, organizations, products)
EmbeddingVector embeddings generated for each chunk
Preference detectionBasic keyword-based preference identification
StorageChunks stored in the vector store; entities indexed for lookup
It skips deep entity resolution against the full registry, explicit relationship/graph-edge mapping, advanced topic categorization, and emotional/sentiment analysis. Memories become available for vector-based retrieval shortly after processing. Use it for real-time chat logging, high-throughput pipelines, routine Q&A, and ephemeral content that doesn’t need deep relationship modeling.
# Fast ingestion for a routine conversation turn
await sdk.memories.create(
    document="User: What's the status of my order?\n"
             "Assistant: Your order #4521 shipped yesterday and should arrive by Thursday.",
    document_type="ai-chat-conversation",
    user_id="user_123",
    mode="fast",
)
# Returns immediately. Memory available for retrieval shortly after.

Fast retrieval

Fast retrieval queries both the vector store and the knowledge graph, but skips the LLM-driven subquery decomposition and reranking that accurate mode adds. That keeps latency low for the hot path of real-time conversations.
1

Query embedding

The query is converted into a vector embedding, consistent with the embeddings created during ingestion.
2

Vector similarity search

The embedding is compared against stored memory embeddings using cosine similarity, scoped to the applicable scope levels (user, customer, client, world) based on the provided user_id and customer_id.
3

Ranking

Results are ranked by cosine similarity. No LLM-driven decomposition or reranking pass is applied.
4

Return

The top-k results (per the configured budget) are returned as structured context.
conversation_id must be a valid UUID. Generate one with str(uuid.uuid4()) and reuse it for every turn in the same conversation.
import uuid

conversation_id = str(uuid.uuid4())  # one UUID per conversation, reused across turns
context = await sdk.conversation.context.fetch(
    conversation_id=conversation_id,
    user_id="user_123",
    search_query=["What do we know about Project Atlas?"],
    mode="fast",
)

for fact in context.facts:
    print(f"[{fact.confidence:.2f}] {fact.content}")
What fast retrieval skips, relative to accurate: LLM subquery decomposition (breaking a complex query into focused sub-queries to widen coverage) and reranking (an extra pass that reorders candidates for relevance). Broad, multi-part questions may therefore retrieve less complete context — those are the cases to send to accurate mode.

Accurate mode

Prioritizes thoroughness and quality over speed. It runs the full extraction pipeline on ingestion (long-range) and adds LLM-driven refinement on retrieval (accurate), producing richer, more connected context.

Long-range ingestion

Long-range ingestion runs the complete extraction pipeline, producing structured, relationship-aware memories that power graph-based retrieval.
1

Semantic chunking

Content is split into semantically coherent chunks, respecting topic boundaries and conversational turns.
2

Deep entity extraction

Captures all people, organizations, products, locations, concepts, and events — including implied entities and role-based references (“my manager”, “the person who handles billing”).
3

Entity resolution

Each entity is matched against the full registry using exact, alias, semantic, and contextual strategies; new entities are auto-registered. See Entity Resolution.
4

Relationship mapping

Explicit and implicit relationships become graph edges — e.g. “Sarah is leading Project Atlas” → Sarah —[leads]—> Project Atlas.
5

Preference detection

Stated, implied, and contextual preferences are extracted with high confidence.
6

Emotional and sentiment analysis

Emotional tone is analyzed and stored as metadata that can influence retrieval ranking.
7

Advanced categorization

Content is classified into a topic hierarchy with domain-specific tags.
8

Vector embedding and graph storage

Chunks are embedded into the vector store; entity relationships are stored in the graph store. Both engines are populated, enabling accurate retrieval’s combined search.
Long-range takes longer than fast, scaling with content length and the number of entities and relationships. It is the default for bootstrap ingestion. Use it for important conversations (strategic discussions, key decisions, escalations), complex documents, profile-building onboarding, and meeting transcripts.
# Long-range ingestion for an important strategic conversation
await sdk.memories.create(
    document=(
        "User: Let's revisit the Project Atlas timeline. I spoke with Sarah Chen "
        "from engineering yesterday, and she's concerned about the Q3 deadline. The "
        "infrastructure team hasn't finished the database migration yet, and James "
        "from DevOps says they need at least three more weeks.\n"
        "User: Note that we might bring in two engineers from the platform team to "
        "help accelerate. Maria approved the budget for that yesterday."
    ),
    document_type="ai-chat-conversation",
    user_id="user_123",
    mode="long-range",
)
This extracts entities (Sarah Chen, James, Maria, Project Atlas, platform/infrastructure teams), relationships (Sarah —[concerned_about]—> Atlas timeline; Maria —[approved]—> budget), decisions (Q4 fallback, two added engineers), and facts (migration incomplete, three-week estimate, Q3 flagged infeasible).

Accurate retrieval

Accurate retrieval queries both stores — the same dual-store retrieval fast mode uses — and adds two distinguishing steps: LLM-driven subquery decomposition and reranking, together with multi-signal ranking.
1

Query embedding & vector search

Same as fast mode — embed the query, find candidates by cosine similarity.
2

LLM subquery decomposition and graph traversal

The query is decomposed into focused sub-queries that expand the entities and angles explored. Entities from the query and top vector results seed graph traversal, following relationship edges to connected entities, related facts, and context. Querying “Project Atlas” reaches Sarah Chen, James, the database migration, the Q3 deadline, and Maria’s budget approval.
3

Cross-engine merging

Vector and graph results are merged into one candidate set; duplicates removed, scores normalized to a common scale.
4

Multi-signal ranking

Candidates are ranked on semantic similarity, recency, graph centrality, and extraction confidence — weighted into a final relevance score.
5

Return

The top-k results are returned as structured context, enriched with entity and relationship metadata.
import uuid

conversation_id = str(uuid.uuid4())  # one UUID per conversation, reused across turns
context = await sdk.conversation.context.fetch(
    conversation_id=conversation_id,
    user_id="user_123",
    search_query=["What do we know about Project Atlas, including who is involved and what decisions have been made?"],
    mode="accurate",
)

for fact in context.facts:
    print(f"[{fact.confidence:.2f}] {fact.content}")
    if fact.entities:
        print(f"  Entities: {', '.join(e.canonical_name for e in fact.entities)}")
    if fact.relationships:
        print(f"  Relationships: {', '.join(str(r) for r in fact.relationships)}")

What graph traversal adds

The same query, fast vs accurate:
Returns memories that directly mention “Project Atlas”:
[0.92] Project Atlas timeline may need to shift to Q4. Q3 deadline flagged as infeasible.
[0.87] Project Atlas kickoff meeting scheduled for January 15th.
[0.81] User asked about the current status of Project Atlas.
Useful, but limited to direct mentions.
Accurate retrieval is most effective when the content was ingested with long-range. The relationship edges available to traverse come from long-range ingestion, not from the retrieval mode. Accurate retrieval still queries both stores regardless, but fast-ingested content has fewer edges to traverse, so you get less of the graph-enhanced context that makes accurate mode valuable.

Choosing a mode

  • Real-time conversations where the user is waiting — fast retrieval is rarely the bottleneck; LLM generation dominates response time.
  • Single-topic queries answerable from one memory chunk (“What is our refund policy?”, “When is Alice’s birthday?”).
  • High-frequency retrieval on every message, at scale — the lower compute cost matters.
  • Latency-sensitive apps: voice agents, real-time collaboration.
  • Complex, multi-entity queries: “Summarize everything about Project Atlas, who’s involved, and what’s been decided.”
  • Relationship queries: “How is Sarah connected to the infrastructure migration?”
  • Comprehensive summaries / briefings that must not miss context (latency is acceptable since they aren’t time-sensitive).
  • Onboarding / profile-building, where deep extraction builds a richer profile that even later fast-mode reads benefit from.
  • High-value interactions: escalations, renewals, executive conversations.

Mixing modes in practice

Most production apps combine both — fast by default, accurate for the queries and writes that justify it:
async def get_context(conversation_id, user_id, query, is_complex=False):
    return await sdk.conversation.context.fetch(
        conversation_id=conversation_id,
        user_id=user_id,
        search_query=[query],
        mode="accurate" if is_complex else "fast",
    )


async def ingest_conversation(content, user_id, is_important=False):
    await sdk.memories.create(
        document=content,
        document_type="ai-chat-conversation",
        user_id=user_id,
        mode="long-range" if is_important else "fast",
    )
A simple keyword heuristic is a reasonable starting point for automatic selection (tune it to your query patterns, or use a lightweight classifier):
def should_use_accurate_mode(query: str) -> bool:
    complex_indicators = [
        "summarize", "everything about", "full briefing",
        "who is involved", "related to", "connected to",
        "all the details", "comprehensive", "overview of",
        "history of", "timeline for",
    ]
    q = query.lower()
    return any(indicator in q for indicator in complex_indicators)
You can also override the per-write default in batch ingestion — e.g. mode="fast" on memories.batch_create(...) items when speed beats extraction depth for high-volume, lower-priority data.

Next steps

Context Fetch SDK

Full SDK reference for retrieval methods and mode selection.

Runtime Ingestion

How runtime ingestion integrates fast mode into the agent loop.

Memory Architecture

Configure ingestion and retrieval defaults in your memory architecture.

Entity Resolution

How long-range’s deep entity resolution builds the knowledge graph.