Skip to main content
In the SDK, accurate ingestion is called long-range mode, while accurate retrieval is called accurate mode. Both refer to the same principle: deeper processing for higher-quality results.

Accurate ingestion (long-range mode)

Long-range ingestion runs the complete extraction pipeline, performing deep analysis that fast mode skips. This produces structured, relationship-aware memories that power the graph-based retrieval capabilities of accurate mode.

The full extraction pipeline

1

Semantic chunking

Content is split into semantically coherent chunks, respecting topic boundaries, paragraph structure, and conversational turns. Chunks maintain enough surrounding context for meaningful standalone interpretation.
2

Deep entity extraction

Named entity recognition identifies all people, organizations, products, locations, concepts, and events in the content. Unlike fast mode’s lightweight NER, long-range extraction captures implied entities, role-based references (“my manager”), and contextual descriptions (“the person who handles billing”).
3

Entity resolution

Each extracted entity is matched against the full entity registry using exact, alias, semantic, and contextual matching strategies. Resolved entities receive canonical names, and new entities are auto-registered. See Entity Resolution for details.
4

Relationship mapping

Explicit and implicit relationships between entities are identified and stored as graph edges. For example, “Sarah from the engineering team is leading Project Atlas” creates edges: Sarah —[member_of]—> Engineering Team, Sarah —[leads]—> Project Atlas.
5

Preference detection

User preferences, opinions, and stated requirements are extracted with high confidence. The pipeline distinguishes between stated preferences (“I prefer dark mode”), implied preferences (consistently requesting concise responses), and contextual preferences (format preferences for specific types of queries).
6

Emotional and sentiment analysis

The emotional tone and sentiment of the conversation are analyzed. This is stored as metadata and can influence retrieval ranking — recent frustrations or positive experiences can be surfaced when relevant.
7

Advanced categorization

Content is classified into a topic hierarchy with domain-specific tags. A conversation about database migration might be tagged with: engineering, infrastructure, migration, PostgreSQL, timeline-discussion, decision.
8

Vector embedding and graph storage

Chunks are embedded and stored in the vector store. Entity relationships and graph edges are stored in the graph store. Both storage engines are populated, enabling the combined search strategies of accurate retrieval.

Processing time

Long-range ingestion typically takes 10 seconds to several minutes, depending on the length and complexity of the content. Longer documents with many entities and relationships take more time to process thoroughly.
Content TypeTypical Processing Time
Short conversation (5-10 turns)10-30 seconds
Long conversation (50+ turns)1-3 minutes
Technical document (5-10 pages)30 seconds - 2 minutes
Meeting transcript (1 hour)2-5 minutes

When to use long-range ingestion

  • Important conversations: Strategic discussions, key decisions, onboarding sessions, escalations.
  • Complex documents: Technical documentation, policy documents, contracts, detailed specifications.
  • Building detailed user profiles: Onboarding conversations where you want to capture a comprehensive understanding of the user’s needs, preferences, and context.
  • Historical data (bootstrap): Long-range is the default for bootstrap ingestion because the extra processing time is acceptable for batch loads.
  • Meeting transcripts: Multi-party conversations with many entities, action items, and decisions benefit significantly from deep extraction.

Code example

from synap import Synap

sdk = Synap(api_key="your_api_key")

# Long-range ingestion for an important strategic conversation
await sdk.memories.create(
    document=(
        "User: Let's revisit the Project Atlas timeline. I spoke with Sarah Chen "
        "from engineering yesterday, and she's concerned about the Q3 deadline. The "
        "infrastructure team hasn't finished the database migration yet, and James "
        "from DevOps says they need at least three more weeks.\n"
        "Assistant: I'll note that. So the key concerns are: Sarah Chen flagged the "
        "Q3 deadline as infeasible, the database migration is blocking progress, and "
        "James estimates three additional weeks for the infrastructure work. Would you "
        "like me to also note the proposed Q4 fallback?\n"
        "User: Yes. And note that we might bring in two engineers from the platform "
        "team to help accelerate. Maria approved the budget for that yesterday."
    ),
    document_type="ai-chat-conversation",
    user_id="user_123",
    customer_id="acme_corp",
    mode="long-range"
)
What long-range extracts from this conversation:
  • Entities: Sarah Chen (person, engineering), James (person, DevOps), Maria (person, approver), Project Atlas (project), platform team (organization), infrastructure team (organization)
  • Relationships: Sarah Chen —[concerned_about]—> Project Atlas timeline, James —[member_of]—> DevOps, Maria —[approved]—> budget for additional engineers
  • Decisions: Q4 fallback proposed, two additional engineers from platform team, Maria approved budget
  • Facts: Database migration incomplete, three weeks estimated for infrastructure work, Q3 deadline flagged as infeasible

Accurate retrieval

Accurate retrieval combines vector similarity search with knowledge graph traversal and multi-signal ranking. This produces contextually richer results that surface not just directly matching content but also connected entities, related decisions, and relationship-aware context.

How it works

1

Query embedding

The query is converted into a vector embedding, identical to the fast mode process.
2

Vector similarity search

The embedding is compared against stored memory embeddings using cosine similarity, producing an initial set of candidate results. This step is the same as fast mode.
3

Graph traversal

Entities mentioned in the query and in the top vector results are used as starting nodes for graph traversal. The traversal follows relationship edges to discover connected entities, related facts, and contextual information that would not appear in a pure vector search.For example, querying “Project Atlas” triggers traversal to connected nodes: Sarah Chen, James, the database migration, the Q3 deadline, and Maria’s budget approval.
4

Cross-engine merging

Results from vector search and graph traversal are merged into a unified candidate set. Duplicates are removed, and results from both engines are normalized to a common relevance scale.
5

Multi-signal ranking

The merged candidates are ranked using multiple signals:
  • Semantic similarity: How closely the content matches the query (cosine similarity)
  • Recency: How recently the memory was created or updated
  • Graph centrality: How connected the memory is to other relevant entities
  • Confidence: The extraction pipeline’s confidence in the accuracy of the memory
These signals are weighted and combined into a final relevance score.
6

Return

The top-k results (determined by the configured budget) are returned as structured context, enriched with entity and relationship metadata.

Latency

MetricValue
P50 latency~200ms
P95 latency~400ms
P99 latency~500ms
Accurate retrieval is approximately 2-5x slower than fast mode due to the graph traversal and multi-signal ranking steps.

Code example

# Accurate retrieval for a complex, relationship-aware query
context = await sdk.conversation.context.fetch(
    user_id="user_123",
    customer_id="acme_corp",
    query="What do we know about Project Atlas, including who is involved and what decisions have been made?",
    mode="accurate"
)

for fact in context.facts:
    print(f"[{fact.confidence:.2f}] {fact.content}")
    if fact.entities:
        print(f"  Entities: {', '.join(e.canonical_name for e in fact.entities)}")
    if fact.relationships:
        print(f"  Relationships: {', '.join(str(r) for r in fact.relationships)}")

What graph traversal adds

The key advantage of accurate mode is graph traversal. Here is a concrete example showing the difference between fast and accurate retrieval for the same query.

Query: “Tell me about Project Atlas”

Fast mode returns memories that directly mention “Project Atlas” in their text:
[0.92] Project Atlas timeline may need to shift to Q4. Q3 deadline flagged as infeasible.
[0.87] Project Atlas kickoff meeting scheduled for January 15th.
[0.81] User asked about the current status of Project Atlas.
These are the memories where the words “Project Atlas” appear in the chunk. Useful, but limited to direct mentions.

Tradeoffs

Accurate mode produces richer context but at a cost. Understanding these tradeoffs helps you decide when to use each mode.
AspectFast ModeAccurate Mode
Retrieval latency~50-100ms~200-500ms
Ingestion time1-5 seconds10 seconds - several minutes
Search scopeVector store onlyVector store + graph store
RankingCosine similaritySimilarity + recency + centrality + confidence
Relationship awarenessCo-occurrence onlyExplicit graph edges and multi-hop traversal
Entity resolution depthBasic NERFull pipeline with semantic matching
Compute costLower2-3x higher
Best forReal-time chat, simple queriesComplex queries, summaries, relationship context
Accurate retrieval requires that the content was ingested with long-range mode to be most effective. If content was ingested in fast mode, the graph store has no relationship data to traverse. The retrieval will still work (falling back to vector-only results), but you will not get the graph-enhanced context that makes accurate mode valuable.

When to use accurate mode

When a user asks about a topic that involves multiple people, projects, or concepts, accurate mode follows the relationship edges to surface all connected context.Example queries that benefit from accurate mode:
  • “Summarize everything about Project Atlas”
  • “What has Sarah been working on?”
  • “Give me a full briefing on this customer”
During a user’s first few interactions, use long-range ingestion to build a comprehensive profile. The deep extraction captures preferences, relationships, and context that fast mode would miss. Future interactions — even in fast mode — benefit from this initial investment.
When generating comprehensive summaries that need to cover all relevant context across multiple conversations, accurate retrieval ensures nothing is missed. The higher latency is acceptable since summaries are not time-sensitive.
Any query that asks about connections between entities benefits from graph traversal. “How is X related to Y?” or “Who is involved in Z?” are natural use cases for accurate mode.
For premium customers or critical interactions (escalations, renewals, executive conversations), the additional latency of accurate mode is justified by the improved context quality.

Mixing modes in practice

Most production applications use a combination of fast and accurate modes. Here is a practical pattern:
from synap import Synap

sdk = Synap(api_key="your_api_key")

async def get_context(user_id: str, customer_id: str, query: str, is_complex: bool = False):
    """Retrieve context with automatic mode selection."""

    mode = "accurate" if is_complex else "fast"

    context = await sdk.conversation.context.fetch(
        user_id=user_id,
        customer_id=customer_id,
        query=query,
        mode=mode
    )

    return context


async def ingest_conversation(
    content: str,
    user_id: str,
    customer_id: str,
    is_important: bool = False
):
    """Ingest with mode selection based on importance."""

    mode = "long-range" if is_important else "fast"

    await sdk.memories.create(
        document=content,
        document_type="ai-chat-conversation",
        user_id=user_id,
        customer_id=customer_id,
        mode=mode
    )

Heuristics for automatic mode selection

You can build simple heuristics to automatically select the appropriate mode:
def should_use_accurate_mode(query: str) -> bool:
    """Simple heuristic for mode selection based on query characteristics."""

    # Keywords that suggest complex, multi-entity queries
    complex_indicators = [
        "summarize", "everything about", "full briefing",
        "who is involved", "related to", "connected to",
        "all the details", "comprehensive", "overview of",
        "history of", "timeline for"
    ]

    query_lower = query.lower()
    return any(indicator in query_lower for indicator in complex_indicators)
These heuristics are starting points. In practice, you will tune them based on your application’s specific query patterns and user expectations. Some teams use a lightweight classifier to make the mode decision more robust.

Next steps

Fast Mode

Understand the speed-optimized alternative for real-time interactions.

Context Fetch SDK

Full SDK reference for retrieval methods and mode configuration.

Memory Architecture

Configure ingestion and retrieval defaults in your memory architecture.

Entity Resolution

How accurate mode’s deep entity resolution builds the knowledge graph.