Accurate Mode - Maximem Synap

In the SDK, accurate ingestion is called long-range mode, while accurate retrieval is called accurate mode. Both refer to the same principle: deeper processing for higher-quality results.

Accurate ingestion (long-range mode)

Long-range ingestion runs the complete extraction pipeline, performing deep analysis that fast mode skips. This produces structured, relationship-aware memories that power the graph-based retrieval capabilities of accurate mode.

The full extraction pipeline

Semantic chunking

Content is split into semantically coherent chunks, respecting topic boundaries, paragraph structure, and conversational turns. Chunks maintain enough surrounding context for meaningful standalone interpretation.

Deep entity extraction

Named entity recognition identifies all people, organizations, products, locations, concepts, and events in the content. Unlike fast mode’s lightweight NER, long-range extraction captures implied entities, role-based references (“my manager”), and contextual descriptions (“the person who handles billing”).

Entity resolution

Each extracted entity is matched against the full entity registry using exact, alias, semantic, and contextual matching strategies. Resolved entities receive canonical names, and new entities are auto-registered. See Entity Resolution for details.

Relationship mapping

Explicit and implicit relationships between entities are identified and stored as graph edges. For example, “Sarah from the engineering team is leading Project Atlas” creates edges: Sarah —[member_of]—> Engineering Team, Sarah —[leads]—> Project Atlas.

Preference detection

User preferences, opinions, and stated requirements are extracted with high confidence. The pipeline distinguishes between stated preferences (“I prefer dark mode”), implied preferences (consistently requesting concise responses), and contextual preferences (format preferences for specific types of queries).

Emotional and sentiment analysis

The emotional tone and sentiment of the conversation are analyzed. This is stored as metadata and can influence retrieval ranking — recent frustrations or positive experiences can be surfaced when relevant.

Advanced categorization

Content is classified into a topic hierarchy with domain-specific tags. A conversation about database migration might be tagged with: engineering, infrastructure, migration, PostgreSQL, timeline-discussion, decision.

Vector embedding and graph storage

Chunks are embedded and stored in the vector store. Entity relationships and graph edges are stored in the graph store. Both storage engines are populated, enabling the combined search strategies of accurate retrieval.

Processing time

Long-range ingestion typically takes 10 seconds to several minutes, depending on the length and complexity of the content. Longer documents with many entities and relationships take more time to process thoroughly.

Content Type	Typical Processing Time
Short conversation (5-10 turns)	10-30 seconds
Long conversation (50+ turns)	1-3 minutes
Technical document (5-10 pages)	30 seconds - 2 minutes
Meeting transcript (1 hour)	2-5 minutes

When to use long-range ingestion

Important conversations: Strategic discussions, key decisions, onboarding sessions, escalations.
Complex documents: Technical documentation, policy documents, contracts, detailed specifications.
Building detailed user profiles: Onboarding conversations where you want to capture a comprehensive understanding of the user’s needs, preferences, and context.
Historical data (bootstrap): Long-range is the default for bootstrap ingestion because the extra processing time is acceptable for batch loads.
Meeting transcripts: Multi-party conversations with many entities, action items, and decisions benefit significantly from deep extraction.

Code example

from maximem_synap import MaximemSynapSDK

sdk = MaximemSynapSDK(api_key="your_api_key")

# Long-range ingestion for an important strategic conversation
await sdk.memories.create(
    document=(
        "User: Let's revisit the Project Atlas timeline. I spoke with Sarah Chen "
        "from engineering yesterday, and she's concerned about the Q3 deadline. The "
        "infrastructure team hasn't finished the database migration yet, and James "
        "from DevOps says they need at least three more weeks.\n"
        "Assistant: I'll note that. So the key concerns are: Sarah Chen flagged the "
        "Q3 deadline as infeasible, the database migration is blocking progress, and "
        "James estimates three additional weeks for the infrastructure work. Would you "
        "like me to also note the proposed Q4 fallback?\n"
        "User: Yes. And note that we might bring in two engineers from the platform "
        "team to help accelerate. Maria approved the budget for that yesterday."
    ),
    document_type="ai-chat-conversation",
    user_id="user_123",
    customer_id="acme_corp",
    mode="long-range"
)

What long-range extracts from this conversation:

Entities: Sarah Chen (person, engineering), James (person, DevOps), Maria (person, approver), Project Atlas (project), platform team (organization), infrastructure team (organization)
Relationships: Sarah Chen —[concerned_about]—> Project Atlas timeline, James —[member_of]—> DevOps, Maria —[approved]—> budget for additional engineers
Decisions: Q4 fallback proposed, two additional engineers from platform team, Maria approved budget
Facts: Database migration incomplete, three weeks estimated for infrastructure work, Q3 deadline flagged as infeasible

Accurate retrieval

Accurate retrieval queries both the vector store and the knowledge graph — the same dual-store retrieval that fast mode uses — and adds two distinguishing steps on top: LLM-driven subquery decomposition and reranking, together with multi-signal ranking. This produces contextually richer results that surface not just directly matching content but also connected entities, related decisions, and relationship-aware context.

How it works

Query embedding

The query is converted into a vector embedding, identical to the fast mode process.

Vector similarity search

The embedding is compared against stored memory embeddings using cosine similarity, producing an initial set of candidate results. This step is the same as fast mode.

LLM subquery decomposition and graph traversal

Both fast and accurate retrieval query the knowledge graph. What accurate mode adds is an LLM-driven decomposition step: the query is broken into focused sub-queries that expand the set of entities and angles explored. Entities mentioned in the query and in the top vector results are used as starting nodes for graph traversal, following relationship edges to discover connected entities, related facts, and contextual information.For example, querying “Project Atlas” triggers traversal to connected nodes: Sarah Chen, James, the database migration, the Q3 deadline, and Maria’s budget approval — and the LLM decomposition surfaces additional sub-queries that broaden coverage.

Cross-engine merging

Results from vector search and graph traversal are merged into a unified candidate set. Duplicates are removed, and results from both engines are normalized to a common relevance scale.

Multi-signal ranking

The merged candidates are ranked using multiple signals:

Semantic similarity: How closely the content matches the query (cosine similarity)
Recency: How recently the memory was created or updated
Graph centrality: How connected the memory is to other relevant entities
Confidence: The extraction pipeline’s confidence in the accuracy of the memory

These signals are weighted and combined into a final relevance score.

Return

The top-k results (determined by the configured budget) are returned as structured context, enriched with entity and relationship metadata.

Latency

Accurate retrieval is higher-latency than fast mode, because of the additional LLM-driven subquery decomposition, reranking, and multi-signal ranking steps. For latency-sensitive paths, prefer fast mode and reserve accurate mode for queries where retrieval quality matters more than response time.

Code example

# Accurate retrieval for a complex, relationship-aware query
context = await sdk.conversation.context.fetch(
    conversation_id="conv_123",
    user_id="user_123",
    customer_id="acme_corp",
    search_query=["What do we know about Project Atlas, including who is involved and what decisions have been made?"],
    mode="accurate"
)

for fact in context.facts:
    print(f"[{fact.confidence:.2f}] {fact.content}")
    if fact.entities:
        print(f"  Entities: {', '.join(e.canonical_name for e in fact.entities)}")
    if fact.relationships:
        print(f"  Relationships: {', '.join(str(r) for r in fact.relationships)}")

What graph traversal adds

The key advantage of accurate mode is graph traversal. Here is a concrete example showing the difference between fast and accurate retrieval for the same query.

Query: “Tell me about Project Atlas”

Fast mode results
Accurate mode results

Fast mode returns memories that directly mention “Project Atlas” in their text:

[0.92] Project Atlas timeline may need to shift to Q4. Q3 deadline flagged as infeasible.
[0.87] Project Atlas kickoff meeting scheduled for January 15th.
[0.81] User asked about the current status of Project Atlas.

These are the memories where the words “Project Atlas” appear in the chunk. Useful, but limited to direct mentions.

Accurate mode returns the same direct mentions plus connected context discovered through graph traversal:

[0.92] Project Atlas timeline may need to shift to Q4. Q3 deadline flagged as infeasible.
[0.89] Sarah Chen from engineering is concerned about the Q3 deadline for Project Atlas.
       Entities: Sarah Chen (person, engineering)
       Relationship: Sarah Chen --[concerned_about]--> Project Atlas timeline
[0.87] Project Atlas kickoff meeting scheduled for January 15th.
[0.85] James from DevOps estimates three more weeks for the infrastructure database migration
       that is blocking Project Atlas.
       Entities: James (person, DevOps)
       Relationship: database migration --[blocks]--> Project Atlas
[0.82] Maria approved budget for two additional engineers from the platform team to
       accelerate Project Atlas delivery.
       Entities: Maria (person, approver), platform team (organization)
       Relationship: Maria --[approved]--> additional engineering budget
[0.78] The platform team currently has six engineers and is working on the API gateway redesign.
       Entities: platform team (organization)

Graph traversal followed the edges from Project Atlas to Sarah Chen, James, Maria, and the platform team, surfacing context that would not appear in a vector-only search. The last result — about the platform team’s current workload — was discovered by traversing from Project Atlas to the platform team entity, even though it does not mention Project Atlas at all.

Tradeoffs

Accurate mode produces richer context but at a cost. Understanding these tradeoffs helps you decide when to use each mode.

Aspect	Fast Mode	Accurate Mode
Retrieval latency	Lower latency	Higher latency
Ingestion time	1-5 seconds	10 seconds - several minutes
Search scope	Vector store + graph store	Vector store + graph store + LLM subquery decomposition + reranking
Ranking	Cosine similarity	Similarity + recency + centrality + confidence
Relationship awareness	Graph relationships, without LLM-driven multi-hop decomposition	Explicit graph edges with LLM-driven multi-hop decomposition
Entity resolution depth	Basic NER	Full pipeline with semantic matching
Compute cost	Lower	2-3x higher
Best for	Real-time chat, simple queries	Complex queries, summaries, relationship context

Accurate retrieval is most effective when the content was ingested with long-range mode. The richness of the graph — the explicit relationship edges available to traverse — depends on long-range ingestion, not on the retrieval mode. Accurate retrieval still queries both the vector store and the graph regardless, but if content was ingested in fast mode there are fewer relationship edges to traverse, so you will not get the graph-enhanced context that makes accurate mode valuable.

When to use accurate mode

Complex queries spanning multiple entities

When a user asks about a topic that involves multiple people, projects, or concepts, accurate mode follows the relationship edges to surface all connected context.Example queries that benefit from accurate mode:

“Summarize everything about Project Atlas”
“What has Sarah been working on?”
“Give me a full briefing on this customer”

Onboarding and profile-building conversations

During a user’s first few interactions, use long-range ingestion to build a comprehensive profile. The deep extraction captures preferences, relationships, and context that fast mode would miss. Future interactions — even in fast mode — benefit from this initial investment.

End-of-day or periodic summaries

When generating comprehensive summaries that need to cover all relevant context across multiple conversations, accurate retrieval ensures nothing is missed. The higher latency is acceptable since summaries are not time-sensitive.

Relationship-aware queries

Any query that asks about connections between entities benefits from graph traversal. “How is X related to Y?” or “Who is involved in Z?” are natural use cases for accurate mode.

High-value customer interactions

For premium customers or critical interactions (escalations, renewals, executive conversations), the additional latency of accurate mode is justified by the improved context quality.

Mixing modes in practice

Most production applications use a combination of fast and accurate modes. Here is a practical pattern:

from maximem_synap import MaximemSynapSDK

sdk = MaximemSynapSDK(api_key="your_api_key")

async def get_context(conversation_id: str, user_id: str, customer_id: str, query: str, is_complex: bool = False):
    """Retrieve context with automatic mode selection."""

    mode = "accurate" if is_complex else "fast"

    context = await sdk.conversation.context.fetch(
        conversation_id=conversation_id,
        user_id=user_id,
        customer_id=customer_id,
        search_query=[query],
        mode=mode
    )

    return context


async def ingest_conversation(
    content: str,
    user_id: str,
    customer_id: str,
    is_important: bool = False
):
    """Ingest with mode selection based on importance."""

    mode = "long-range" if is_important else "fast"

    await sdk.memories.create(
        document=content,
        document_type="ai-chat-conversation",
        user_id=user_id,
        customer_id=customer_id,
        mode=mode
    )

Heuristics for automatic mode selection

You can build simple heuristics to automatically select the appropriate mode:

def should_use_accurate_mode(query: str) -> bool:
    """Simple heuristic for mode selection based on query characteristics."""

    # Keywords that suggest complex, multi-entity queries
    complex_indicators = [
        "summarize", "everything about", "full briefing",
        "who is involved", "related to", "connected to",
        "all the details", "comprehensive", "overview of",
        "history of", "timeline for"
    ]

    query_lower = query.lower()
    return any(indicator in query_lower for indicator in complex_indicators)

These heuristics are starting points. In practice, you will tune them based on your application’s specific query patterns and user expectations. Some teams use a lightweight classifier to make the mode decision more robust.

Next steps

Fast Mode

Understand the speed-optimized alternative for real-time interactions.

Context Fetch SDK

Full SDK reference for retrieval methods and mode configuration.

Memory Architecture

Configure ingestion and retrieval defaults in your memory architecture.

Entity Resolution

How accurate mode’s deep entity resolution builds the knowledge graph.

Documentation Index

​Accurate ingestion (long-range mode)

​The full extraction pipeline

​Processing time

​When to use long-range ingestion

​Code example

​Accurate retrieval

​How it works

​Latency

​Code example

​What graph traversal adds

​Query: “Tell me about Project Atlas”

​Tradeoffs

​When to use accurate mode

​Mixing modes in practice

​Heuristics for automatic mode selection

​Next steps

Fast Mode

Context Fetch SDK

Memory Architecture

Entity Resolution

Accurate ingestion (long-range mode)

The full extraction pipeline

Processing time

When to use long-range ingestion

Code example

Accurate retrieval

How it works

Latency

Code example

What graph traversal adds

Query: “Tell me about Project Atlas”

Tradeoffs

When to use accurate mode

Mixing modes in practice

Heuristics for automatic mode selection

Next steps