Entity Resolution & Master Data Management

Beyond simple deduplication, entity resolution serves as a master data management layer for your AI agents. The entity registry acts as a master data store: a single source of truth for the people, organizations, products, and concepts your application encounters. As conversations accumulate, this registry grows into a rich organizational knowledge base that improves retrieval accuracy, enables entity-centric queries, and provides a foundation for building structured knowledge on top of unstructured conversations.

Entity resolution runs automatically during ingestion. No additional SDK calls are needed. Every document that passes through the ingestion pipeline has its entities extracted and resolved before storage.

Why master data management matters

Traditional AI applications treat each conversation as isolated text. Over hundreds or thousands of interactions, the same entities appear under different names, in different contexts, and from different users. Without entity resolution, your agent has no way to connect “the CEO” mentioned in one conversation with “Maria Garcia” mentioned in another. The entity registry solves this by:

Consolidating identity: All references to the same real-world entity converge on a single canonical record, regardless of how they were originally mentioned
Building organizational knowledge over time: Each conversation enriches the registry with new aliases, context, and relationships, making future resolution more accurate
Enabling entity-centric retrieval: Instead of searching by keywords, you can retrieve all memories associated with a specific entity across all conversations and users
Providing auditability: The registry tracks when each entity was first seen, last referenced, and how it has been resolved, giving you a clear provenance trail

How it works

Entity resolution is a multi-step process that runs as part of the ingestion pipeline:

Extract entities from text

The ingestion pipeline identifies entity mentions in the incoming content. Entities include people, organizations, products, locations, and other named references. Each entity mention is extracted with its surrounding context.

Search the entity registry

Each extracted entity is matched against the entity registry. The search follows the scope chain (USER, CUSTOMER, CLIENT, WORLD), checking narrowest scopes first. Matching uses both exact text comparison and semantic similarity via vector embeddings.

Resolve or register

If a match is found, the entity mention is linked to the existing canonical entity. If no match is found, the entity is auto-registered at CUSTOMER scope for future lookups. Ambiguous matches (multiple possible candidates) can be queued for human review.

Apply canonical names

Resolved entities receive a canonical_name that is consistent across all references. This canonical name is stored alongside the extracted memory, enabling precise retrieval by entity.

The entity registry

The entity registry is a database of known entities, organized by scope (User → Customer → Client → World). It functions as the master data store for all entities your application encounters. Each registry entry contains:

Field	Description
`canonical_name`	The authoritative name for this entity (e.g., “John Smith”)
`aliases`	Known alternative names and references (e.g., “John”, “Mr. Smith”, “JS”)
`entity_type`	Category: `person`, `organization`, `product`, `location`, `concept`, etc.
`scope`	The scope level where this entity is registered (user, customer, client, world)
`embedding`	A vector embedding for semantic matching
`metadata`	Arbitrary metadata (role, department, relationship to user, etc.)
`created_at`	When this entity was first registered
`last_seen`	When this entity was last referenced in an ingestion

Scope-aware lookups

The registry is searched following the scope chain, narrowest first:

USER scope:     Entities specific to this user
CUSTOMER scope: Entities shared within the organization
CLIENT scope:   Entities shared across your application
WORLD scope:    Global entities

This ordering means that if a user has a personal contact named “Alex” and the company also has an employee named “Alex”, the user-scoped entity takes priority in that user’s context. The customer-scoped entity remains available for other users in the same organization.

Matching strategies

Synap uses multiple matching strategies to resolve entities, applied in order of confidence:

Exact match
Alias match
Semantic match
Contextual match

The extracted entity name exactly matches a canonical name or alias in the registry.

Input: "John Smith"
Registry: canonical_name="John Smith"
Result: Exact match (confidence: 1.0)

The extracted entity matches a known alias of a registered entity.

Input: "Mr. Smith"
Registry: canonical_name="John Smith", aliases=["Mr. Smith", "JS"]
Result: Alias match (confidence: 0.95)

The entity’s vector embedding is compared against registry embeddings using cosine similarity. This catches cases where the surface form is different but the meaning is the same.

Input: "my team lead from engineering"
Registry: canonical_name="John Smith", metadata={"role": "Engineering Team Lead"}
Result: Semantic match (confidence: 0.82)

The surrounding context of the entity mention is used to disambiguate. If multiple registry entries match by name, the context helps pick the right one.

Input: "Alex from the billing department called"
Registry:
  - canonical_name="Alex Chen", metadata={"department": "Engineering"}
  - canonical_name="Alex Rivera", metadata={"department": "Billing"}
Result: Contextual match → Alex Rivera (confidence: 0.88)

Auto-registration

When the resolution pipeline encounters an entity that does not match any existing registry entry, it automatically registers the entity at CUSTOMER scope. This means:

The system learns new entities organically as conversations happen
Future mentions of the same entity will resolve to the auto-registered entry
No manual entity management is required for common use cases
Auto-registered entities can be promoted, edited, or merged through the review queue

Conversation 1: "I had a call with Sarah from the partner team."
→ No match found → Auto-registers "Sarah" at CUSTOMER scope
  canonical_name: "Sarah"
  entity_type: "person"
  metadata: {"context": "partner team"}

Conversation 2: "Sarah mentioned the Q3 timeline is shifting."
→ Matches auto-registered "Sarah" → Links to same canonical entity

Conversation 3: "Sarah Chen confirmed the new deadline."
→ Matches "Sarah" → Updates canonical_name to "Sarah Chen", adds alias "Sarah"

Auto-registration happens at CUSTOMER scope by default because it provides the right balance: entities are shared within an organization (so all users in that org benefit) but isolated from other organizations (preventing cross-tenant entity leakage).

The review queue

When the resolution pipeline encounters an ambiguous match (where multiple registry entries are plausible candidates) the entity is placed in a review queue for human review rather than making an incorrect automatic resolution.

What triggers a review queue entry

Multiple registry entries match with similar confidence scores
A semantic match falls in the ambiguity zone (confidence between 0.5 and 0.8)
An auto-registered entity closely resembles an existing entry (possible duplicate)

Managing the review queue

Review queue items appear in the Dashboard → Entities → Review queue view. From there you can merge ambiguous mentions into an existing canonical entity, create a new entity, or dismiss the match.

SDK-level access to the review queue is on the roadmap. For now, resolution is dashboard-only. Contact [email protected] if you need programmatic access for a specific workflow.

Code examples

Automatic resolution during ingestion

Entity resolution happens transparently during ingestion. You do not need to make any special calls:

from maximem_synap import MaximemSynapSDK

sdk = MaximemSynapSDK(api_key="your_api_key")

# ER happens automatically during ingestion
await sdk.memories.create(
    document="John Smith from Acme Corp called about the Q4 report.",
    document_type="ai-chat-conversation",
    user_id="user_123",
    customer_id="acme_corp"
)

# Future mentions of "John", "Mr. Smith", "JS" will resolve to the same entity
await sdk.memories.create(
    document="Mr. Smith followed up on the Q4 numbers. He wants the final version by Friday.",
    document_type="ai-chat-conversation",
    user_id="user_123",
    customer_id="acme_corp"
)

# When retrieving, entities are already resolved
context = await sdk.user.context.fetch(
    user_id="user_123",
    customer_id="acme_corp"
)

# Memories about "John Smith" and "Mr. Smith" are linked to the same canonical entity,
# this surfaces in retrieval as a single coherent set of facts about that person, even when
# different mentions appear across different ingested documents.
for fact in context.facts:
    print(fact.content)

To retrieve memories about a specific entity, pass the entity name (or canonical form) as a search query. Synap’s accurate retrieval mode does the heavy lifting: it traverses the entity graph to find memories linked to that entity, even when the entity isn’t named verbatim in the source text.

context = await sdk.user.context.fetch(
    user_id="user_123",
    customer_id="acme_corp",
    search_query=["John Smith"],
    max_results=20,
    mode="accurate",
)

for fact in context.facts:
    print(f"[{fact.confidence:.0%}] {fact.content}")

Entity types

Synap recognizes and categorizes entities into standard types:

Entity Type	Description	Examples
`person`	Individual people	”John Smith”, “the CEO”, “my manager”
`organization`	Companies, teams, groups	”Acme Corp”, “the engineering team”, “Google”
`product`	Products, services, tools	”Jira”, “the new dashboard”, “iPhone 15”
`location`	Physical or virtual locations	”Portland”, “the NYC office”, “Slack channel #general”
`concept`	Abstract concepts, topics	”microservices migration”, “Q4 budget”, “annual review”
`event`	Named events or occurrences	”the Q3 launch”, “last week’s outage”, “the board meeting”

Best practices

Provide context in your documents

The more context you include in ingested documents, the better entity resolution works. Full names, roles, and departments help distinguish between entities with similar names.Instead of: “Alex said the deadline is Friday.” Prefer: “Alex Rivera from the billing team said the deadline is Friday.”

Use consistent customer_id values

Entity resolution relies on scope boundaries. Ensure you use consistent customer_id values across all ingestion calls for the same organization. Inconsistent IDs will fragment the entity registry.

Review ambiguous matches promptly

The review queue catches edge cases that automatic resolution cannot handle. Review these regularly to maintain entity registry quality. Unresolved queue items do not block ingestion: they use the best available match and flag it for review.

Let the system learn

Auto-registration is designed to build the entity registry organically. Avoid manually populating the registry for every possible entity. Instead, let natural conversations populate it and use the review queue to catch errors.

Treat the registry as master data

The entity registry is not just a deduplication tool: it is your application’s master data store for entities. Invest in keeping it clean: merge duplicates, correct canonical names, and enrich metadata. The quality of entity resolution improves directly with registry quality.

Working with entity resolution in the SDK

Entity resolution is fully automatic. There are no explicit SDK calls to trigger or configure it. As you ingest more data through sdk.memories.create(), Synap continuously builds and refines its entity registry, improving resolution accuracy over time. The sections below show how resolution surfaces in practice and how it interacts with retrieval.

Resolution across conversations

The following example demonstrates how entity resolution enriches retrieval results across multiple conversations and users. The same person is mentioned as “Sarah Chen”, “S. Chen”, and “Sarah” across three separate ingestions, and resolution links them all to one canonical entity.

import uuid

# Conversation 1: User mentions "Sarah Chen" explicitly
await sdk.memories.create(
    document="""User: I had a great meeting with Sarah Chen about the Q3 roadmap.
Assistant: That sounds productive! What were the key takeaways?
User: She wants to prioritize the API redesign and defer the dashboard rewrite.""",
    document_type="ai-chat-conversation",
    user_id="user_alice",
    customer_id="cust_acme_corp",
    mode="long-range"
)

# Conversation 2: Same user mentions "S. Chen" in a different context
await sdk.memories.create(
    document="""User: Can you remind me what S. Chen said about the timeline?
Assistant: I'll look into that for you.
User: Also, she mentioned something about needing two more engineers.""",
    document_type="ai-chat-conversation",
    user_id="user_alice",
    customer_id="cust_acme_corp",
    mode="long-range"
)

# Conversation 3: Different user at same customer mentions "Sarah"
await sdk.memories.create(
    document="""User: Sarah approved the new budget for infrastructure.
Assistant: Great news! What's the approved amount?
User: $150k for Q3, up from $120k last quarter.""",
    document_type="ai-chat-conversation",
    user_id="user_bob",
    customer_id="cust_acme_corp",
    mode="long-range"
)

# Later retrieval: Querying about "Sarah Chen" returns facts from ALL three
# conversations because ER resolved "S. Chen" and "Sarah" to the same entity.
# conversation_id must be a valid UUID string; generate one with
# str(uuid.uuid4()) or reuse a UUID you already manage per conversation.
context = await sdk.conversation.context.fetch(
    conversation_id=str(uuid.uuid4()),
    search_query=["What do we know about Sarah Chen?"],
    mode="accurate"
)

for fact in context.facts:
    print(f"- {fact.content}")

# Example output:
# - Sarah Chen wants to prioritize the API redesign for Q3
# - Sarah Chen wants to defer the dashboard rewrite
# - Sarah Chen needs two more engineers
# - Sarah Chen approved $150k infrastructure budget for Q3

In this example, the ER system:

Registered “Sarah Chen” during the first ingestion
Resolved “S. Chen” to “Sarah Chen” during the second ingestion
Resolved “Sarah” to “Sarah Chen” during the third ingestion (CUSTOMER scope match)
Linked all extracted facts to the same canonical entity, enabling comprehensive retrieval

Impact on retrieval modes

Both retrieval modes benefit from entity resolution directly: resolved canonical entities give them the same graph linkage across conversations. accurate mode additionally adds LLM subquery decomposition and reranking on top.

Retrieval Mode	ER Benefit
`fast`	Direct: vector + graph traversal follows entity relationships across conversations
`accurate`	Direct: vector + graph traversal plus LLM subquery decomposition and reranking

For queries that span multiple conversations or involve entity relationships, both modes benefit from a well-populated entity registry; accurate additionally trades extra latency for LLM-driven query decomposition and result reranking.

Next steps

Memories & Context

See how entity resolution fits into the full ingestion pipeline.

Memory Scopes

Understand the scope chain that entity lookups follow.

How Ingestion Works

How resolved entities are stored in the vector and graph engines during ingestion.

​Why master data management matters

​How it works

​The entity registry

​Scope-aware lookups

​Matching strategies

​Auto-registration

​The review queue

​What triggers a review queue entry

​Managing the review queue

​Code examples

​Automatic resolution during ingestion

​Querying for entity-related context

​Entity types

​Best practices

​Working with entity resolution in the SDK

​Resolution across conversations

​Impact on retrieval modes

​Next steps

Memories & Context

Memory Scopes

How Ingestion Works

Why master data management matters

How it works

The entity registry

Scope-aware lookups

Matching strategies

Auto-registration

The review queue

What triggers a review queue entry

Managing the review queue

Code examples

Automatic resolution during ingestion

Querying for entity-related context

Entity types

Best practices

Working with entity resolution in the SDK

Resolution across conversations

Impact on retrieval modes

Next steps