Skip to main content

Overview

Entity Resolution (ER) is the process of identifying when different mentions in your data refer to the same real-world entity. When a user says “my manager Sarah”, “S. Chen”, and “Sarah Chen” across different conversations, Synap’s ER system recognizes these as the same person and links them to a single canonical entity. Entity resolution runs automatically during ingestion. There are no explicit SDK calls required to trigger or configure it. As you ingest more data through sdk.memories.create(), Synap continuously builds and refines its entity registry, improving resolution accuracy over time.

How It Works

When a document enters the ingestion pipeline, entity resolution occurs as part of the extraction stage. Here is what happens behind the scenes:
1

Entity Extraction

The ingestion pipeline identifies entity mentions in the document — people, organizations, projects, locations, products, and other named entities. Each mention is extracted with surrounding context to aid disambiguation.
2

Scope-Based Lookup

For each extracted entity, Synap searches the entity registry using a narrowest-first scope chain: USER, then CUSTOMER, then CLIENT, then WORLD. The search stops at the first scope where a match is found.
3

Matching and Resolution

The ER system uses a combination of exact string matching, normalized name matching, and semantic similarity (via 384-dimensional vector embeddings) to find potential matches in the registry. Matches above the confidence threshold are resolved.
4

Canonical Name Assignment

When a match is found, the extracted entity’s canonical_name is set to the registry entry’s canonical form. All future references resolve to this same canonical entity, enabling cross-conversation linking.
5

Auto-Registration

When no match is found in any scope, the entity is automatically registered in the entity registry at CUSTOMER scope. This means the system learns from every new entity it encounters.
Entity resolution pipeline: extract, search scopes, match, assign canonical name or auto-register

The Scope Chain

Synap searches for entity matches across scopes in a strict order, from narrowest to broadest:
PriorityScopeDescriptionExample
1USEREntities visible only to a specific user”my wife” resolves to “Emily” for user_alice but “Maria” for user_bob
2CUSTOMEREntities shared across all users in a customer/organization”the CEO” resolves to “Jane Smith” for all users at Acme Corp
3CLIENTEntities shared across all customers within your Synap clientProduct names, internal project codenames
4WORLDGlobal entities (reserved for future use)Well-known public entities
The narrowest-first search order means that a user-scoped entity always takes precedence over a customer-scoped entity with the same name. This is essential for handling ambiguous references like “my manager” that resolve differently per user.
The scope chain stops at the first match. If “Project Atlas” exists at both USER and CUSTOMER scope, the USER-scope entry is used. This ensures personal context is not overridden by organizational context.

Auto-Registration

When the ER system encounters an entity that does not match any existing registry entry, it automatically registers the entity at CUSTOMER scope. This is a deliberate design choice with important implications:
  • The system improves over time. The first mention of “Sarah Chen” creates a registry entry. Every subsequent mention of “S. Chen”, “Sarah”, or “Chen” can now resolve to the same canonical entity.
  • No manual entity setup required. You do not need to pre-populate an entity registry. The system bootstraps itself from your data.
  • CUSTOMER scope is the default. Auto-registered entities are visible to all users within the same customer, which is the right default for most organizational entities (team members, projects, products).
First ingestion:
  "Sarah Chen" mentioned → No match found → Auto-registered as "Sarah Chen" at CUSTOMER scope

Second ingestion:
  "S. Chen" mentioned → Semantic match found → Resolved to canonical "Sarah Chen"

Third ingestion (different user, same customer):
  "Sarah" mentioned → Partial match found → Resolved to canonical "Sarah Chen"

What Happens to Resolved Entities

When an entity is resolved, the extracted entity’s canonical_name field is populated with the registry’s canonical form. This has downstream effects throughout Synap:
  • Graph storage: Resolved entities are stored as nodes in the graph with edges connecting them to related entities, facts, and memories. Because they share a canonical identity, information about “Sarah Chen” from different conversations is linked in the graph.
  • Retrieval quality: When you retrieve context with sdk.conversation.context.fetch(), the accurate mode traverses entity relationships in the graph. Resolved entities mean that asking about “S. Chen” returns facts from conversations that mentioned “Sarah Chen”.
  • Cross-conversation linking: Facts extracted from one conversation (“Sarah Chen is the engineering lead”) become accessible when retrieving context for a different conversation that mentions “S. Chen”.

Scope-Based Visibility

Entities registered at a given scope are only visible to lookups within that scope and narrower:
CLIENT scope entity "Synap SDK"
  └─ Visible to ALL customers and users

CUSTOMER scope entity "Sarah Chen"
  └─ Visible to all users in cust_acme_corp
  └─ NOT visible to users in cust_other_org

USER scope entity "my wife → Emily"
  └─ Visible ONLY to user_alice
  └─ NOT visible to user_bob (who has his own "my wife" mapping)
Scope-based visibility ensures that private entity mappings (like personal nicknames or relationship references) remain private to the user who created them, while organizational knowledge is shared across the team.

Review Queue

In some cases, the ER system encounters ambiguous matches where the confidence is above a minimum threshold but below the auto-resolution threshold. These cases are placed in a review queue for human review. Ambiguous matches typically occur when:
  • Two entities have similar names but may be different people (e.g., “John Smith” in two different departments)
  • A nickname could map to multiple canonical entities
  • An entity’s context is insufficient to determine the correct match
The review queue is manageable from the Synap Dashboard in a future release. Currently, ambiguous matches below the auto-resolution threshold are auto-registered as new entities. As the system encounters more context about these entities, it may automatically merge them in future ingestions.

Practical Example

The following example demonstrates how entity resolution enriches retrieval results across multiple conversations.
# Conversation 1: User mentions "Sarah Chen" explicitly
await sdk.memories.create(
    document="""User: I had a great meeting with Sarah Chen about the Q3 roadmap.
Assistant: That sounds productive! What were the key takeaways?
User: She wants to prioritize the API redesign and defer the dashboard rewrite.""",
    document_type="ai-chat-conversation",
    user_id="user_alice",
    customer_id="cust_acme_corp",
    mode="long-range"
)

# Conversation 2: Same user mentions "S. Chen" in a different context
await sdk.memories.create(
    document="""User: Can you remind me what S. Chen said about the timeline?
Assistant: I'll look into that for you.
User: Also, she mentioned something about needing two more engineers.""",
    document_type="ai-chat-conversation",
    user_id="user_alice",
    customer_id="cust_acme_corp",
    mode="long-range"
)

# Conversation 3: Different user at same customer mentions "Sarah"
await sdk.memories.create(
    document="""User: Sarah approved the new budget for infrastructure.
Assistant: Great news! What's the approved amount?
User: $150k for Q3, up from $120k last quarter.""",
    document_type="ai-chat-conversation",
    user_id="user_bob",
    customer_id="cust_acme_corp",
    mode="long-range"
)

# Later retrieval: Querying about "Sarah Chen" returns facts from ALL three
# conversations because ER resolved "S. Chen" and "Sarah" to the same entity
context = await sdk.conversation.context.fetch(
    conversation_id="conv_new_123",
    search_query=["What do we know about Sarah Chen?"],
    mode="accurate"
)

for fact in context.facts:
    print(f"- {fact.content}")

# Example output:
# - Sarah Chen wants to prioritize the API redesign for Q3
# - Sarah Chen wants to defer the dashboard rewrite
# - Sarah Chen needs two more engineers
# - Sarah Chen approved $150k infrastructure budget for Q3
In this example, the ER system:
  1. Registered “Sarah Chen” during the first ingestion
  2. Resolved “S. Chen” to “Sarah Chen” during the second ingestion
  3. Resolved “Sarah” to “Sarah Chen” during the third ingestion (CUSTOMER scope match)
  4. Linked all extracted facts to the same canonical entity, enabling comprehensive retrieval
Entity resolution is fully automatic. As you ingest more data, Synap builds a richer entity registry that improves future resolutions. The more context the system has about an entity, the better it becomes at resolving ambiguous references.

Impact on Retrieval Modes

Entity resolution primarily benefits the accurate retrieval mode, which traverses entity relationships in the graph. The fast mode relies on vector similarity alone and benefits from ER indirectly (resolved entities share embedding neighborhoods).
Retrieval ModeER Benefit
fastIndirect — resolved entities have better vector alignment
accurateDirect — graph traversal follows entity relationships across conversations
For queries that span multiple conversations or involve entity relationships, accurate mode with a well-populated entity registry delivers significantly richer results.

Next Steps

Ingesting Memories

Feed more data to build a richer entity registry.

Retrieving Memories

Query context that leverages resolved entities.

Scopes

Understand how scopes affect entity visibility and memory access.

Entities and Resolution

Deep dive into entity resolution concepts and architecture.