Skip to main content
Beyond simple deduplication, entity resolution serves as a master data management layer for your AI agents. The entity registry acts as a master data store — a single source of truth for the people, organizations, products, and concepts your application encounters. As conversations accumulate, this registry grows into a rich organizational knowledge base that improves retrieval accuracy, enables entity-centric queries, and provides a foundation for building structured knowledge on top of unstructured conversations.
Entity resolution runs automatically during ingestion. No additional SDK calls are needed. Every document that passes through the ingestion pipeline has its entities extracted and resolved before storage.

Why master data management matters

Traditional AI applications treat each conversation as isolated text. Over hundreds or thousands of interactions, the same entities appear under different names, in different contexts, and from different users. Without entity resolution, your agent has no way to connect “the CEO” mentioned in one conversation with “Maria Garcia” mentioned in another. The entity registry solves this by:
  • Consolidating identity: All references to the same real-world entity converge on a single canonical record, regardless of how they were originally mentioned
  • Building organizational knowledge over time: Each conversation enriches the registry with new aliases, context, and relationships, making future resolution more accurate
  • Enabling entity-centric retrieval: Instead of searching by keywords, you can retrieve all memories associated with a specific entity across all conversations and users
  • Providing auditability: The registry tracks when each entity was first seen, last referenced, and how it has been resolved, giving you a clear provenance trail

How it works

Entity resolution is a multi-step process that runs as part of the ingestion pipeline:
Entity resolution flow: Extract entities from text, match against registry, use canonical name if matched, auto-register if unmatched
1

Extract entities from text

The ingestion pipeline identifies entity mentions in the incoming content. Entities include people, organizations, products, locations, and other named references. Each entity mention is extracted with its surrounding context.
2

Search the entity registry

Each extracted entity is matched against the Instance’s entity registry. The search follows the scope chain (USER, CUSTOMER, CLIENT, WORLD), checking narrowest scopes first. Matching uses both exact text comparison and semantic similarity via vector embeddings.
3

Resolve or register

If a match is found, the entity mention is linked to the existing canonical entity. If no match is found, the entity is auto-registered at CUSTOMER scope for future lookups. Ambiguous matches (multiple possible candidates) can be queued for human review.
4

Apply canonical names

Resolved entities receive a canonical_name that is consistent across all references. This canonical name is stored alongside the extracted memory, enabling precise retrieval by entity.

The entity registry

The entity registry is a per-Instance database of known entities. It functions as the master data store for all entities your application encounters. Each registry entry contains:
FieldDescription
canonical_nameThe authoritative name for this entity (e.g., “John Smith”)
aliasesKnown alternative names and references (e.g., “John”, “Mr. Smith”, “JS”)
entity_typeCategory: person, organization, product, location, concept, etc.
scopeThe scope level where this entity is registered (user, customer, client, world)
embeddingA 384-dimensional vector embedding for semantic matching
metadataArbitrary metadata (role, department, relationship to user, etc.)
created_atWhen this entity was first registered
last_seenWhen this entity was last referenced in an ingestion

Scope-aware lookups

The registry is searched following the scope chain, narrowest first:
1. USER scope    — Entities specific to this user
2. CUSTOMER scope — Entities shared within the organization
3. CLIENT scope   — Entities shared across your application
4. WORLD scope    — Global entities
This ordering means that if a user has a personal contact named “Alex” and the company also has an employee named “Alex”, the user-scoped entity takes priority in that user’s context. The customer-scoped entity remains available for other users in the same organization.

Matching strategies

Synap uses multiple matching strategies to resolve entities, applied in order of confidence:
The extracted entity name exactly matches a canonical name or alias in the registry.
Input: "John Smith"
Registry: canonical_name="John Smith"
Result: Exact match (confidence: 1.0)

Auto-registration

When the resolution pipeline encounters an entity that does not match any existing registry entry, it automatically registers the entity at CUSTOMER scope. This means:
  • The system learns new entities organically as conversations happen
  • Future mentions of the same entity will resolve to the auto-registered entry
  • No manual entity management is required for common use cases
  • Auto-registered entities can be promoted, edited, or merged through the review queue
Conversation 1: "I had a call with Sarah from the partner team."
→ No match found → Auto-registers "Sarah" at CUSTOMER scope
  canonical_name: "Sarah"
  entity_type: "person"
  metadata: {"context": "partner team"}

Conversation 2: "Sarah mentioned the Q3 timeline is shifting."
→ Matches auto-registered "Sarah" → Links to same canonical entity

Conversation 3: "Sarah Chen confirmed the new deadline."
→ Matches "Sarah" → Updates canonical_name to "Sarah Chen", adds alias "Sarah"
Auto-registration happens at CUSTOMER scope by default because it provides the right balance: entities are shared within an organization (so all users in that org benefit) but isolated from other organizations (preventing cross-tenant entity leakage).

The review queue

When the resolution pipeline encounters an ambiguous match — where multiple registry entries are plausible candidates — the entity is placed in a review queue for human review rather than making an incorrect automatic resolution.

What triggers a review queue entry

  • Multiple registry entries match with similar confidence scores
  • A semantic match falls in the ambiguity zone (confidence between 0.5 and 0.8)
  • An auto-registered entity closely resembles an existing entry (possible duplicate)

Managing the review queue

Review queue entries can be resolved through the Dashboard or API:
# List pending review items
pending = await sdk.entities.review_queue.list(
    customer_id="acme_corp",
    status="pending"
)

for item in pending:
    print(f"Entity: {item.entity_mention}")
    print(f"Candidates: {item.candidates}")
    print(f"Context: {item.source_context}")

# Resolve a review queue item
await sdk.entities.review_queue.resolve(
    item_id=item.id,
    resolution="merge",               # merge | create_new | dismiss
    target_entity_id="entity_abc123"   # if merging
)

Code examples

Automatic resolution during ingestion

Entity resolution happens transparently during ingestion. You do not need to make any special calls:
from synap import Synap

sdk = Synap(api_key="your_api_key")

# ER happens automatically during ingestion
await sdk.memories.create(
    document="John Smith from Acme Corp called about the Q4 report.",
    document_type="ai-chat-conversation",
    user_id="user_123",
    customer_id="acme_corp"
)

# Future mentions of "John", "Mr. Smith", "JS" will resolve to the same entity
await sdk.memories.create(
    document="Mr. Smith followed up on the Q4 numbers. He wants the final version by Friday.",
    document_type="ai-chat-conversation",
    user_id="user_123",
    customer_id="acme_corp"
)

# When retrieving, entities are already resolved
context = await sdk.user.context.fetch(
    user_id="user_123",
    customer_id="acme_corp"
)

# Memories about "John Smith" and "Mr. Smith" are linked to the same canonical entity
for fact in context.facts:
    if fact.entities:
        for entity in fact.entities:
            print(f"Entity: {entity.canonical_name}")  # "John Smith"

Querying by entity

You can retrieve all memories associated with a specific entity:
# Find all memories related to a specific entity
memories = await sdk.memories.search(
    entity="John Smith",
    customer_id="acme_corp",
    max_results=20
)

for memory in memories:
    print(f"[{memory.type}] {memory.content}")
    print(f"  Source: {memory.source.document_type} at {memory.source.created_at}")

Entity types

Synap recognizes and categorizes entities into standard types:
Entity TypeDescriptionExamples
personIndividual people”John Smith”, “the CEO”, “my manager”
organizationCompanies, teams, groups”Acme Corp”, “the engineering team”, “Google”
productProducts, services, tools”Jira”, “the new dashboard”, “iPhone 15”
locationPhysical or virtual locations”Portland”, “the NYC office”, “Slack channel #general”
conceptAbstract concepts, topics”microservices migration”, “Q4 budget”, “annual review”
eventNamed events or occurrences”the Q3 launch”, “last week’s outage”, “the board meeting”

Best practices

The more context you include in ingested documents, the better entity resolution works. Full names, roles, and departments help distinguish between entities with similar names.Instead of: “Alex said the deadline is Friday.” Prefer: “Alex Rivera from the billing team said the deadline is Friday.”
Entity resolution relies on scope boundaries. Ensure you use consistent customer_id values across all ingestion calls for the same organization. Inconsistent IDs will fragment the entity registry.
The review queue catches edge cases that automatic resolution cannot handle. Review these regularly to maintain entity registry quality. Unresolved queue items do not block ingestion — they use the best available match and flag it for review.
Auto-registration is designed to build the entity registry organically. Avoid manually populating the registry for every possible entity. Instead, let natural conversations populate it and use the review queue to catch errors.
The entity registry is not just a deduplication tool — it is your application’s master data store for entities. Invest in keeping it clean: merge duplicates, correct canonical names, and enrich metadata. The quality of entity resolution improves directly with registry quality.

Next steps

Memories & Context

See how entity resolution fits into the full ingestion pipeline.

Memory Scopes

Understand the scope chain that entity lookups follow.

Storage Infrastructure

Learn how resolved entities are stored in graph and vector engines.