The Memory Architecture Configuration Artifact (MACA) is a YAML document that controls every aspect of how Synap processes, stores, and retrieves memories for an instance. It determines what gets extracted from conversations, how it is organized in storage, and how retrieval results are ranked and delivered.
This guide covers every section of the MACA config in detail, with practical recipes for common use cases.
A MACA configuration has three top-level sections: storage, ingestion, and retrieval. Here is a fully annotated example:
maca-config.yaml
# ============================================================# MACA Configuration — Memory Architecture Configuration Artifact# ============================================================# Version: 1.0.0# Instance: inst_a1b2c3d4e5f67890# ============================================================version: "1.0.0"# --- Storage: Where and how memories are persisted ---storage: vector: namespace: "prod-memories" # Logical partition in the vector store embedding_dimension: 1536 # Must match your embedding model (1536 = OpenAI) enabled: true # Set false to disable vector storage entirely graph: namespace: "prod-knowledge" # Logical partition in the graph store enabled: true # Set false to disable graph/entity storage scoping: primary_scope: "user" # user | customer | instance retention: max_memory_age_days: 365 # Memories older than this are auto-purged (0 = no limit)# --- Ingestion: What gets extracted from incoming content ---ingestion: categories: facts: true # Factual statements about users/topics preferences: true # User likes, dislikes, choices temporal_events: true # Time-bound events and plans relationships: true # Connections between entities procedures: false # Step-by-step instructions (off by default) emotions: false # Emotional states and sentiment extraction: mode: "standard" # standard | enhanced confidence_threshold: 0.7 # 0.0-1.0 — minimum confidence to persist a memory chunking: strategy: "semantic" # semantic | fixed max_chunk_tokens: 512 # Maximum tokens per chunk pii: handling: "redact" # redact | mask | passthrough categories: ["email", "phone", "ssn", "credit_card"] agent_hints: - "Focus on extracting product preferences and purchase history" - "Treat project deadlines as high-priority temporal events"# --- Retrieval: How memories are searched and ranked ---retrieval: modes: fast: true # ~50-100ms, vector-only search accurate: true # ~200-500ms, vector + graph + re-ranking ranking: recency_weight: 0.3 # 0.0-1.0 — how much to favor recent memories relevance_weight: 0.5 # 0.0-1.0 — how much to favor semantic match confidence_weight: 0.2 # 0.0-1.0 — how much to favor high-confidence memories anticipation: enabled: false # Predictive pre-fetch for common query patterns cache_ttl_seconds: 300 # How long anticipated results stay cached context_budget: max_tokens: 4096 # Maximum tokens returned in a single retrieval agent_hints: - "Prioritize the user's most recent preferences over older ones" - "Include relationship context when entities are mentioned"
The version field is required. Synap uses it to track config history and enable rollback. Always increment the version when making changes.
Logical partition name. Use different namespaces for different environments (e.g., dev-memories, staging-memories, prod-memories).
"default"
embedding_dimension
Must match your embedding model. Common values: 1536 (OpenAI text-embedding-ada-002/3-small), 768 (Cohere), 384 (MiniLM).
1536
enabled
Set to false to disable vector storage entirely. Retrieval will rely on graph-only queries.
true
Changing embedding_dimension on an existing instance does not re-embed existing memories. If you switch embedding models, you will need to re-index. See the Migration Guide for details.
The graph store maintains a knowledge graph of entities and their relationships. It powers entity resolution, relationship queries, and structured lookups.
Set to false to disable graph storage. Entity resolution and relationship extraction will be skipped.
true
For most applications, keep both vector and graph stores enabled. The combination of semantic search (vector) and structured queries (graph) produces significantly better retrieval results than either alone.
Factual statements about users, topics, or the world
”User works at Acme Corp as a software engineer”
preferences
Likes, dislikes, choices, and stated preferences
”User prefers dark mode and concise responses”
temporal_events
Time-bound events, deadlines, plans, and schedules
”User has a dentist appointment next Tuesday”
relationships
Connections between entities (people, organizations, concepts)
“Alice manages the engineering team at Acme”
procedures
Step-by-step instructions and workflows
”To deploy, run tests first, then build, then push”
emotions
Emotional states, sentiment, and tone
”User expressed frustration about the billing issue”
Start with facts and preferences enabled. These two categories deliver the most value for conversational AI. Add temporal_events if your application involves scheduling or planning. Enable relationships when entity connections matter (e.g., CRM, team management).
PII is detected and permanently removed before storage. Original content cannot be recovered.
"mask"
PII is replaced with tokens (e.g., [EMAIL], [PHONE]). The mapping is stored separately for authorized unmasking.
"passthrough"
No PII processing. Content is stored as-is. Use only when you handle PII externally.
If your application handles user data subject to GDPR, CCPA, or similar regulations, do not use "passthrough". Use "redact" or "mask" and ensure your retention policy complies with applicable data protection requirements.
Agent hints are natural-language instructions that guide the extraction pipeline. They help Synap understand domain-specific context that the general-purpose pipeline might miss.
ingestion: agent_hints: - "Focus on extracting product preferences and purchase history" - "Treat project deadlines as high-priority temporal events" - "When users mention team members, extract the reporting relationship"
Agent hints are powerful for domain-specific tuning. Write them as clear, specific instructions. Avoid vague hints like “extract everything important” — instead, name the specific types of information that matter for your use case.
Vector + graph search with cross-encoder re-ranking
Complex queries, research, cases where precision outweighs speed
When both modes are enabled, the SDK selects the mode via the mode parameter in context.fetch():
# Fast mode for real-time chatcontext = await sdk.conversation.context.fetch( conversation_id="conv_789", search_query=["user preferences"], mode="fast")# Accurate mode for a detailed research querycontext = await sdk.conversation.context.fetch( conversation_id="conv_789", search_query=["all interactions with Acme Corp in Q3"], mode="accurate")
Ranking weights control how retrieved memories are scored and ordered. All three weights should be between 0.0 and 1.0. They do not need to sum to 1.0 — they are normalized internally.
User context changes frequently, recent info is more valuable
relevance_weight
Favors memories semantically closest to the search query
Query accuracy matters most, user history is stable
confidence_weight
Favors memories with higher extraction confidence scores
Operating in high-stakes domains where accuracy is critical
A good starting point for most applications is relevance: 0.5, recency: 0.3, confidence: 0.2. This prioritizes relevance while giving a moderate boost to recent memories and a light preference for high-confidence extractions. Tune from there based on your retrieval quality observations.
Anticipation enables predictive pre-fetching of context for common query patterns. When enabled, Synap analyzes retrieval patterns and pre-caches likely-needed context before the SDK requests it.
How long anticipated results stay in the pre-fetch cache.
300 (5 minutes)
Anticipation adds a small amount of background processing and storage overhead. Enable it when you observe repetitive retrieval patterns (e.g., a support bot that frequently looks up the same customer context). For applications with highly diverse queries, the hit rate may be too low to justify the overhead.
The context budget controls the maximum volume of content returned in a single retrieval call.
retrieval: context_budget: max_tokens: 4096
Parameter
Description
Default
max_tokens
Maximum number of tokens across all returned memories. Synap truncates and prioritizes to stay within budget.
4096
Align max_tokens with your LLM’s context window:
LLM Context Window
Recommended max_tokens
Reasoning
4K tokens
1024-1536
Leave room for system prompt + user message + response
8K tokens
2048-3072
Comfortable budget for most conversations
32K+ tokens
4096-8192
Generous context, but more is not always better
Setting max_tokens too high can flood your LLM prompt with marginally relevant context, reducing response quality. Start conservative and increase only if you observe retrieval misses.
Similar to ingestion hints, retrieval agent hints guide how Synap ranks and filters results.
retrieval: agent_hints: - "Prioritize the user's most recent preferences over older ones" - "Include relationship context when entities are mentioned" - "Deprioritize procedural memories unless the user asks 'how to'"
These are battle-tested configurations for common application patterns.
Customer Support Bot
Personal Assistant
Knowledge Base Agent
High-Volume Analytics
High recall, user-scoped, all core categories enabled. Optimized for quickly retrieving a customer’s full context during a support interaction.
customer-support-maca.yaml
version: "1.0.0"storage: vector: namespace: "support-memories" embedding_dimension: 1536 enabled: true graph: namespace: "support-knowledge" enabled: true scoping: primary_scope: "user" retention: max_memory_age_days: 180ingestion: categories: facts: true preferences: true temporal_events: true relationships: true procedures: false emotions: true # Track frustration/satisfaction for escalation extraction: mode: "standard" confidence_threshold: 0.6 # Lower threshold for broader recall chunking: strategy: "semantic" max_chunk_tokens: 512 pii: handling: "mask" categories: ["email", "phone", "ssn", "credit_card"] agent_hints: - "Extract product names, order IDs, and issue descriptions as facts" - "Track customer sentiment and frustration level" - "Note any escalation requests or supervisor mentions"retrieval: modes: fast: true accurate: true ranking: recency_weight: 0.4 # Recent interactions matter most in support relevance_weight: 0.4 confidence_weight: 0.2 anticipation: enabled: true # Support queries are repetitive cache_ttl_seconds: 600 context_budget: max_tokens: 3072 agent_hints: - "Always include the customer's most recent open issue" - "Include past resolution history for similar problems"
Preferences-heavy, user-scoped with long retention. Designed for applications where the AI builds a deep understanding of individual users over time.
personal-assistant-maca.yaml
version: "1.0.0"storage: vector: namespace: "assistant-memories" embedding_dimension: 1536 enabled: true graph: namespace: "assistant-knowledge" enabled: true scoping: primary_scope: "user" retention: max_memory_age_days: 0 # No limit -- long-term memoryingestion: categories: facts: true preferences: true # Core value for personalization temporal_events: true # Calendars, deadlines, plans relationships: true # Who the user knows procedures: false emotions: false extraction: mode: "enhanced" # Deeper extraction for richer personalization confidence_threshold: 0.7 chunking: strategy: "semantic" max_chunk_tokens: 512 pii: handling: "redact" categories: ["ssn", "credit_card"] agent_hints: - "Extract dietary preferences, travel preferences, and hobbies" - "Track family members, colleagues, and friends as relationships" - "Note recurring commitments and routine schedules"retrieval: modes: fast: true accurate: true ranking: recency_weight: 0.2 # Older preferences still matter relevance_weight: 0.6 # Relevance is king for personalization confidence_weight: 0.2 anticipation: enabled: false context_budget: max_tokens: 4096 agent_hints: - "Preferences should always outrank facts in personalization queries" - "Include temporal events only if they are upcoming (within 30 days)"
Facts-focused, client-scoped, large context budget. Designed for agents that answer questions from a shared knowledge base rather than tracking individual users.
knowledge-base-maca.yaml
version: "1.0.0"storage: vector: namespace: "kb-memories" embedding_dimension: 1536 enabled: true graph: namespace: "kb-knowledge" enabled: true scoping: primary_scope: "instance" # Shared knowledge, not per-user retention: max_memory_age_days: 0 # Knowledge should persist indefinitelyingestion: categories: facts: true # Primary category preferences: false temporal_events: false relationships: true # Useful for "who owns what" queries procedures: true # How-to documentation emotions: false extraction: mode: "enhanced" # Thorough extraction for documentation confidence_threshold: 0.8 # High threshold -- quality over quantity chunking: strategy: "semantic" max_chunk_tokens: 1024 # Larger chunks for coherent passages pii: handling: "passthrough" # KB content typically has no PII categories: [] agent_hints: - "Extract definitions, technical specifications, and API details" - "Preserve code examples and configuration snippets as procedures" - "Link related concepts and components as relationships"retrieval: modes: fast: false # Accuracy matters more than speed for KB accurate: true ranking: recency_weight: 0.1 # KB content is usually stable relevance_weight: 0.7 # Find the most relevant answer confidence_weight: 0.2 anticipation: enabled: false context_budget: max_tokens: 8192 # Large budget for comprehensive answers agent_hints: - "Include procedural steps when the query asks 'how to'" - "Include related concept definitions for technical queries"
Minimal extraction, fast mode only, strict retention. Designed for applications that process high volumes of data and need to control costs.
high-volume-maca.yaml
version: "1.0.0"storage: vector: namespace: "analytics-memories" embedding_dimension: 384 # Smaller model for cost efficiency enabled: true graph: namespace: "analytics-knowledge" enabled: false # Skip graph to reduce processing scoping: primary_scope: "customer" retention: max_memory_age_days: 30 # Aggressive retention limitingestion: categories: facts: true preferences: false temporal_events: false relationships: false procedures: false emotions: false extraction: mode: "standard" # Faster processing confidence_threshold: 0.8 # Only high-confidence extractions chunking: strategy: "fixed" # Faster than semantic chunking max_chunk_tokens: 256 # Small chunks for quick processing pii: handling: "redact" categories: ["email", "phone", "ssn", "credit_card"] agent_hints: - "Extract only key metrics and quantitative facts"retrieval: modes: fast: true accurate: false # Fast mode only ranking: recency_weight: 0.6 # Most recent data is most relevant relevance_weight: 0.3 confidence_weight: 0.1 anticipation: enabled: true cache_ttl_seconds: 120 context_budget: max_tokens: 2048 agent_hints: - "Prioritize quantitative facts and metrics"
result = await admin_client.config.apply( instance_id="inst_a1b2c3d4e5f67890", config_yaml=open("maca-config.yaml").read())print(f"Applied version: {result.version}")print(f"Status: {result.status}")
5
Monitor
After applying, monitor your instance in the Dashboard for any unexpected behavior. Check:
Ingestion success rate
Memory extraction counts by category
Retrieval latency
Error rates
Configuration changes affect new requests only. Existing memories are not re-processed, re-extracted, or re-embedded when you change the config. If you need to re-process existing content (e.g., after adding a new category), you must re-ingest the source documents.
If a config change causes issues, roll back to a previous version:
# List config historyhistory = await admin_client.config.list_versions( instance_id="inst_a1b2c3d4e5f67890")for version in history: print(f"v{version.version} — {version.applied_at} — {version.status}")# Roll back to a specific versionresult = await admin_client.config.rollback( instance_id="inst_a1b2c3d4e5f67890", target_version=2)
Document your rollback plan before applying any config change, especially in production. Know which version you will roll back to and what the impact will be.