Fast & Accurate Modes - Maximem Synap

mode= controls a speed-vs-thoroughness tradeoff. It appears on two different axes. Label which one you mean:

Axis	Parameter	Values	Picks between
Ingestion	`memories.create(mode=...)`	`fast` · `long-range`	How deeply a write is processed and indexed
Retrieval	`...context.fetch(mode=...)`	`fast` · `accurate`	How much work a read does to assemble context

“Fast” means the same thing on both axes (lightweight, low-latency). The thorough setting is called long-range for ingestion and accurate for retrieval: same principle, deeper processing for higher quality. The two are independent: you can ingest long-range and read back fast, or any combination.

Quick comparison

Aspect	Fast	Accurate
Retrieval latency	Lower	Higher
Ingestion processing	Faster	Slower (deeper)
Search method	Vector + graph (no LLM query decomposition)	Vector + graph + LLM subquery decomposition + reranking
Ranking signals	Cosine similarity	Similarity + recency + graph centrality + confidence
Entity resolution	Lightweight (basic NER)	Full pipeline (semantic matching, cross-reference)
Relationship awareness	Graph relationships, no LLM-driven multi-hop decomposition	Explicit graph edges with LLM-driven multi-hop decomposition
Compute cost	Lower	Higher
Best for	Real-time chat, simple queries, high throughput	Complex queries, summaries, relationship-aware context

The two modes are not mutually exclusive. Use fast for the hot path of a live conversation and switch to accurate for specific high-value queries, all within the same application and the same Synap Instance.

Building a real-time chatbot or voice agent? Start with fast for both ingestion and retrieval, then selectively upgrade specific interactions to long-range / accurate as needed, no architecture change required.

Fast mode

The recommended default for real-time, conversational agents where low latency matters more than exhaustive extraction.

Fast ingestion

Fast ingestion runs a lightweight extraction pipeline, optimized to make memories available quickly.

Stage	Behavior
Chunking	Basic semantic chunking by paragraph and sentence boundaries
Entity extraction	Lightweight named entity recognition (people, organizations, products)
Embedding	Vector embeddings generated for each chunk
Preference detection	Basic keyword-based preference identification
Storage	Chunks stored in the vector store; entities indexed for lookup

It skips deep entity resolution against the full registry, explicit relationship/graph-edge mapping, advanced topic categorization, and emotional/sentiment analysis. Memories become available for vector-based retrieval shortly after processing. Use it for real-time chat logging, high-throughput pipelines, routine Q&A, and ephemeral content that doesn’t need deep relationship modeling.

# Fast ingestion for a routine conversation turn
await sdk.memories.create(
    document="User: What's the status of my order?\n"
             "Assistant: Your order #4521 shipped yesterday and should arrive by Thursday.",
    document_type="ai-chat-conversation",
    user_id="user_123",
    mode="fast",
)
# Returns immediately. Memory available for retrieval shortly after.

Fast retrieval

Fast retrieval queries both the vector store and the knowledge graph, but skips the LLM-driven subquery decomposition and reranking that accurate mode adds. That keeps latency low for the hot path of real-time conversations.

Query embedding

The query is converted into a vector embedding, consistent with the embeddings created during ingestion.

Vector similarity search

The embedding is compared against stored memory embeddings using cosine similarity, scoped to the applicable scope levels (user, customer, client, world) based on the provided user_id and customer_id.

Ranking

Results are ranked by cosine similarity. No LLM-driven decomposition or reranking pass is applied.

Return

The top-k results (per the configured budget) are returned as structured context.

conversation_id must be a valid UUID. Generate one with str(uuid.uuid4()) and reuse it for every turn in the same conversation.

import uuid

conversation_id = str(uuid.uuid4())  # one UUID per conversation, reused across turns
context = await sdk.conversation.context.fetch(
    conversation_id=conversation_id,
    user_id="user_123",
    search_query=["What do we know about Project Atlas?"],
    mode="fast",
)

for fact in context.facts:
    print(f"[{fact.confidence:.2f}] {fact.content}")

What fast retrieval skips, relative to accurate: LLM subquery decomposition (breaking a complex query into focused sub-queries to widen coverage) and reranking (an extra pass that reorders candidates for relevance). Broad, multi-part questions may therefore retrieve less complete context: those are the cases to send to accurate mode.

Accurate mode

Prioritizes thoroughness and quality over speed. It runs the full extraction pipeline on ingestion (long-range) and adds LLM-driven refinement on retrieval (accurate), producing richer, more connected context.

Long-range ingestion

Long-range ingestion runs the complete extraction pipeline, producing structured, relationship-aware memories that power graph-based retrieval.

Semantic chunking

Content is split into semantically coherent chunks, respecting topic boundaries and conversational turns.

Deep entity extraction

Captures all people, organizations, products, locations, concepts, and events, including implied entities and role-based references (“my manager”, “the person who handles billing”).

Entity resolution

Each entity is matched against the full registry using exact, alias, semantic, and contextual strategies; new entities are auto-registered. See Entity Resolution.

Relationship mapping

Explicit and implicit relationships become graph edges: e.g. “Sarah is leading Project Atlas” → Sarah —[leads]—> Project Atlas.

Preference detection

Stated, implied, and contextual preferences are extracted with high confidence.

Emotional and sentiment analysis

Emotional tone is analyzed and stored as metadata that can influence retrieval ranking.

Advanced categorization

Content is classified into a topic hierarchy with domain-specific tags.

Vector embedding and graph storage

Chunks are embedded into the vector store; entity relationships are stored in the graph store. Both engines are populated, enabling accurate retrieval’s combined search.

Long-range takes longer than fast, scaling with content length and the number of entities and relationships. It is the default for bootstrap ingestion. Use it for important conversations (strategic discussions, key decisions, escalations), complex documents, profile-building onboarding, and meeting transcripts.

# Long-range ingestion for an important strategic conversation
await sdk.memories.create(
    document=(
        "User: Let's revisit the Project Atlas timeline. I spoke with Sarah Chen "
        "from engineering yesterday, and she's concerned about the Q3 deadline. The "
        "infrastructure team hasn't finished the database migration yet, and James "
        "from DevOps says they need at least three more weeks.\n"
        "User: Note that we might bring in two engineers from the platform team to "
        "help accelerate. Maria approved the budget for that yesterday."
    ),
    document_type="ai-chat-conversation",
    user_id="user_123",
    mode="long-range",
)

This extracts entities (Sarah Chen, James, Maria, Project Atlas, platform/infrastructure teams), relationships (Sarah —[concerned_about]—> Atlas timeline; Maria —[approved]—> budget), decisions (Q4 fallback, two added engineers), and facts (migration incomplete, three-week estimate, Q3 flagged infeasible).

Accurate retrieval

Accurate retrieval queries both stores (the same dual-store retrieval fast mode uses) and adds two distinguishing steps: LLM-driven subquery decomposition and reranking, together with multi-signal ranking.

Query embedding & vector search

Same as fast mode: embed the query, find candidates by cosine similarity.

LLM subquery decomposition and graph traversal

The query is decomposed into focused sub-queries that expand the entities and angles explored. Entities from the query and top vector results seed graph traversal, following relationship edges to connected entities, related facts, and context. Querying “Project Atlas” reaches Sarah Chen, James, the database migration, the Q3 deadline, and Maria’s budget approval.

Cross-engine merging

Vector and graph results are merged into one candidate set; duplicates removed, scores normalized to a common scale.

Multi-signal ranking

Candidates are ranked on semantic similarity, recency, graph centrality, and extraction confidence, weighted into a final relevance score.

Return

The top-k results are returned as structured context, enriched with entity and relationship metadata.

import uuid

conversation_id = str(uuid.uuid4())  # one UUID per conversation, reused across turns
context = await sdk.conversation.context.fetch(
    conversation_id=conversation_id,
    user_id="user_123",
    search_query=["What do we know about Project Atlas, including who is involved and what decisions have been made?"],
    mode="accurate",
)

for fact in context.facts:
    print(f"[{fact.confidence:.2f}] {fact.content}")
    if fact.entities:
        print(f"  Entities: {', '.join(e.canonical_name for e in fact.entities)}")
    if fact.relationships:
        print(f"  Relationships: {', '.join(str(r) for r in fact.relationships)}")

What graph traversal adds

The same query, fast vs accurate:

Fast mode results
Accurate mode results

Returns memories that directly mention “Project Atlas”:

[0.92] Project Atlas timeline may need to shift to Q4. Q3 deadline flagged as infeasible.
[0.87] Project Atlas kickoff meeting scheduled for January 15th.
[0.81] User asked about the current status of Project Atlas.

Useful, but limited to direct mentions.

Returns the same direct mentions plus connected context discovered through graph traversal:

[0.92] Project Atlas timeline may need to shift to Q4. Q3 deadline flagged as infeasible.
[0.89] Sarah Chen from engineering is concerned about the Q3 deadline for Project Atlas.
       Entities: Sarah Chen (person, engineering)
       Relationship: Sarah Chen --[concerned_about]--> Project Atlas timeline
[0.85] James from DevOps estimates three more weeks for the infrastructure database migration
       that is blocking Project Atlas.
       Relationship: database migration --[blocks]--> Project Atlas
[0.82] Maria approved budget for two additional engineers from the platform team to
       accelerate Project Atlas delivery.
       Relationship: Maria --[approved]--> additional engineering budget
[0.78] The platform team currently has six engineers and is working on the API gateway redesign.

Traversal followed edges from Project Atlas to Sarah Chen, James, Maria, and the platform team. The last result (the platform team’s workload) was discovered by traversing to the platform team entity, even though it never mentions Project Atlas.

Accurate retrieval is most effective when the content was ingested with long-range. The relationship edges available to traverse come from long-range ingestion, not from the retrieval mode. Accurate retrieval still queries both stores regardless, but fast-ingested content has fewer edges to traverse, so you get less of the graph-enhanced context that makes accurate mode valuable.

Precision level

mode isn’t the only retrieval knob. Context fetch also accepts an optional precision_level parameter — a second, independent axis that controls how tightly results are filtered before they’re returned.

`precision_level`	Behavior
`high`	Results go through an additional relevance-refinement pass before being returned. Default.
`medium`	Skips the refinement pass for faster responses. Recall isn’t impacted — the same candidate memories are searched — but outputs are less precisely filtered.

The refinement pass filters candidates rather than finding them, so dropping to medium never shrinks what’s searched. What changes is how precisely the output is filtered: expect an occasional loosely-related item in exchange for a faster response. precision_level is orthogonal to mode — combine it with either fast or accurate. For real latency on your instance, see Dashboard → Usage.

Choosing a mode

Use fast for…

Real-time conversations where the user is waiting: fast retrieval is rarely the bottleneck; LLM generation dominates response time.
Single-topic queries answerable from one memory chunk (“What is our refund policy?”, “When is Alice’s birthday?”).
High-frequency retrieval on every message, at scale: the lower compute cost matters.
Latency-sensitive apps: voice agents, real-time collaboration.

Upgrade to accurate for…

Complex, multi-entity queries: “Summarize everything about Project Atlas, who’s involved, and what’s been decided.”
Relationship queries: “How is Sarah connected to the infrastructure migration?”
Comprehensive summaries / briefings that must not miss context (latency is acceptable since they aren’t time-sensitive).
Onboarding / profile-building, where deep extraction builds a richer profile that even later fast-mode reads benefit from.
High-value interactions: escalations, renewals, executive conversations.

Mixing modes in practice

Most production apps combine both, fast by default, accurate for the queries and writes that justify it:

async def get_context(conversation_id, user_id, query, is_complex=False):
    return await sdk.conversation.context.fetch(
        conversation_id=conversation_id,
        user_id=user_id,
        search_query=[query],
        mode="accurate" if is_complex else "fast",
    )


async def ingest_conversation(content, user_id, is_important=False):
    await sdk.memories.create(
        document=content,
        document_type="ai-chat-conversation",
        user_id=user_id,
        mode="long-range" if is_important else "fast",
    )

A simple keyword heuristic is a reasonable starting point for automatic selection (tune it to your query patterns, or use a lightweight classifier):

def should_use_accurate_mode(query: str) -> bool:
    complex_indicators = [
        "summarize", "everything about", "full briefing",
        "who is involved", "related to", "connected to",
        "all the details", "comprehensive", "overview of",
        "history of", "timeline for",
    ]
    q = query.lower()
    return any(indicator in q for indicator in complex_indicators)

You can also override the per-write default in batch ingestion: e.g. mode="fast" on memories.batch_create(...) items when speed beats extraction depth for high-volume, lower-priority data.

Next steps

Context Fetch SDK

Full SDK reference for retrieval methods and mode selection.

Runtime Ingestion

How runtime ingestion integrates fast mode into the agent loop.

Memory Architecture

Configure ingestion and retrieval defaults in your memory architecture.

Entity Resolution

How long-range’s deep entity resolution builds the knowledge graph.

​Quick comparison

​Fast mode

​Fast ingestion

​Fast retrieval

​Accurate mode

​Long-range ingestion

​Accurate retrieval

​What graph traversal adds

​Precision level

​Choosing a mode

​Mixing modes in practice

​Next steps

Context Fetch SDK

Runtime Ingestion

Memory Architecture

Entity Resolution

Quick comparison

Fast mode

Fast ingestion

Fast retrieval

Accurate mode

Long-range ingestion

Accurate retrieval

What graph traversal adds

Precision level

Choosing a mode

Mixing modes in practice

Next steps