Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.maximem.ai/llms.txt

Use this file to discover all available pages before exploring further.

Status: In Development · Playground demo coming soon. The recipe below is complete and runnable today — only the hosted playground showcase is pending.
A two-agent support system where a fast Tier-1 triage agent handles common issues and hands off to a specialist Tier-2 agent for harder cases. Both agents share memory, so the Tier-2 specialist doesn’t make the customer re-explain anything — the full context, plus T1’s triage summary, is already loaded.

What you’ll build

A multi-agent support cluster where:
  • Tier-1 triages — answers common questions, takes safe actions, escalates clean
  • Tier-2 specializes — picks up with full T1 context already in memory, runs deeper diagnostics
  • Memory is shared across both agents — same user_id, same customer_id
  • Handoffs are explicit — the customer is told they’re being moved, T1 writes a summary, T2 reads it
Est. build time: 60–75 minutes (multi-agent orchestration takes longer to get right).

When to use this recipe

Build this if:
  • Your support has a meaningful skill split (general vs specialist, billing vs technical, etc.)
  • A high % of tickets resolve at T1 and you want to keep T2 capacity for the hard ones
  • Customer continuity across the handoff matters — no “please explain again” moments
  • You can describe the escalation rule clearly (this is the bit that breaks if vague)

Architecture at a glance

Tier-1 to Tier-2 support escalation architecture: customer chat hits Tier-1 agent for triage, easy cases reply directly, hard cases escalate through a shared Synap memory pool to the Tier-2 specialist agent which writes its resolution back to memory
The handoff happens via memory, not state. T2 doesn’t need a routing payload — it pulls everything it needs from Synap on first call.

Stack

LayerChoice
Synap SDKmaximem-synap (Python) / @maximem/synap (TypeScript)
FrameworkOpenAI Agents SDK (Python, uses native handoffs) / Vercel AI SDK (TypeScript, manual routing)
LLMOpenAI gpt-4o-mini for T1 (cheap + fast) and gpt-4o for T2
Routing stateIn-memory dict for the demo; Redis in production
Multi-agent orchestration is a great fit for LangGraph (Python) and Mastra (TypeScript) if you want graph-shaped routing with retries and persistence baked in. The recipe below uses OpenAI Agents / Vercel AI SDK to stay consistent with the rest of the Cookbook — port to LangGraph/Mastra once your routing graph grows.

Prerequisites

  • A Synap API key — see Authentication
  • Python: Python 3.11+
  • TypeScript: Node 18+ and Python 3.11+ on the host

Install

pip install maximem-synap maximem-synap-openai-agents openai-agents

Build it

1. Shared scoping

Both agents use the same scopes. That’s the whole trick.
  • customer_id = "<your-product>" — single tenant or per-customer org
  • user_id = <ticket requester ID>
  • conversation_id — one per ticket, shared across both agents

2. The escalation policy

This belongs in your code, not the LLM’s head. T1 calls a tool to escalate; the tool decides what counts.
ESCALATE_REASONS = {
    "customer_requested_human",
    "technical_depth_required",
    "policy_exception_needed",
    "safety_or_legal",
    "repeat_failure",  # T1 already tried twice
}

@function_tool
async def escalate_to_t2(user_id: str, reason: str, summary: str) -> dict:
    """Hand off to Tier-2 specialist. Reason must be one of: {reasons}."""
    assert reason in ESCALATE_REASONS, f"Invalid escalation reason: {reason}"
    # Persist the structured handoff in memory so T2 picks it up
    await sdk.memories.create(
        document=f"T1 escalation: {summary}",
        document_type="support-escalation",
        user_id=user_id,
        customer_id=CUSTOMER_ID,
        metadata={"escalation_reason": reason, "from_tier": "t1", "to_tier": "t2"},
    )
    ROUTING[user_id] = "t2"
    return {"status": "escalated", "to": "t2"}

3. The Tier-1 agent

Fast model, common-issue tools, escalate when out of depth.
SYSTEM_T1 = """You are a Tier-1 support agent.

- Handle common issues: account questions, basic troubleshooting, status checks, simple refunds.
- If the issue is: safety, legal, a technical deep-dive, a policy exception, or you've already tried twice
  without resolution — call escalate_to_t2 with a clear reason and a 1-paragraph summary of what
  you tried and what you learned about the customer's situation.
- Never invent answers. If you don't know and can't escalate, say so."""

t1_agent = Agent(
    name="t1_support",
    instructions=SYSTEM_T1,
    model="gpt-4o-mini",
    tools=[
        FunctionTool(synap_search, name_override="synap_search"),
        FunctionTool(synap_store,  name_override="synap_store"),
        get_account_status, check_outage, basic_refund,
        escalate_to_t2,
    ],
)

4. The Tier-2 agent

Bigger model, deeper tools, picks up with full T1 context already in memory.
SYSTEM_T2 = """You are a Tier-2 support specialist.

- Read the T1 escalation summary and the customer's history from memory before responding.
- Greet the customer briefly and confirm what you understand they need — don't make them re-explain.
- Use specialist tools. You have permission to make policy exceptions when justified.
- If you resolve, summarize the resolution back into memory."""

t2_agent = Agent(
    name="t2_specialist",
    instructions=SYSTEM_T2,
    model="gpt-4o",
    tools=[
        FunctionTool(synap_search, name_override="synap_search"),
        FunctionTool(synap_store,  name_override="synap_store"),
        deep_diagnostic, run_db_query, issue_credit, policy_exception,
    ],
)

5. The router

One small function picks which agent gets the next message based on routing state.
ROUTING: dict[str, str] = {}  # user_id -> "t1" | "t2"
CUSTOMER_ID = "your-product"

async def handle_message(user_id: str, text: str) -> str:
    conv_id = SESSIONS.setdefault(user_id, str(uuid.uuid4()))
    tier = ROUTING.get(user_id, "t1")

    if tier == "t2":
        # Inject a one-time bridge message on the customer's side so they know
        bridge = "Connecting you with a specialist now…"
        # T2 will read the escalation summary from memory itself
        result = await Runner.run(t2_agent, input=text)
    else:
        result = await Runner.run(t1_agent, input=text)

    reply = result.final_output

    asyncio.create_task(sdk.memories.create(
        document=f"Customer: {text}\n[{tier.upper()}]: {reply}",
        document_type="ai-chat-conversation",
        user_id=user_id,
        customer_id=CUSTOMER_ID,
        metadata={"conversation_id": conv_id, "tier": tier},
    ))
    return reply

Run & verify

Turn 1 (routes to T1)
Customer:  My last invoice is wrong, you charged me twice.
T1:        I see two charges on 5/12 for $49. The second was retried after a network blip
           and refunded automatically the next day. Refund #R-9012 — check your statement
           on or after 5/15. Anything else?
Turn 2 (still T1, but escalates)
Customer:  No it's not there. And this is the third time I've had double-charges.
T1:        That's persistent — I'm bringing in a billing specialist who can look deeper.
           One moment.
           [Internally: escalate_to_t2(reason=repeat_failure, summary="Customer reports
            3rd double-charge incident. Auto-refund R-9012 issued 5/13 not visible to
            customer. Pattern check needed.")]
Turn 3 (now T2)
Customer:  Hi.
T2:        Hi — I've read the history. You're seeing double-charges intermittently, and
           the auto-refund for last week didn't reach your account. Let me pull the
           merchant ledger for your card and the failure logs on the retry job.
           [calls run_db_query, calls issue_credit]
           Confirmed: three failed retries this quarter, all on the same card BIN. I've
           issued an immediate $98 credit and flagged your account so this can't repeat.
           Sorry for the back-and-forth.
The customer didn’t repeat anything. T2 read the escalation from memory and dove in.

Customize / extend

  • Three or more tiers → add tools and routing entries. Memory model is unchanged.
  • Human at the end → the final tier can be a human queue. See WhatsApp + Human Handoff for the pattern.
  • LangGraph / Mastra port → if routing grows into a real graph, port to LangGraph (Python) or Mastra (TypeScript).
  • Slack as the channel for T2 → some teams have T2 specialists working out of Slack. Same agent, different I/O. See Patterns → Slack Bot.

Troubleshooting

T2 re-asks the customer to explain
  • Sharpen the T2 system prompt’s “don’t make them re-explain” rule.
  • Confirm the T1 escalation memory is being written before the customer’s next turn (no race).
  • Check synap_search actually fires on the first T2 turn — log tool calls during development.
T1 escalates too eagerly
  • The escalation rule is too vague in the prompt. Tighten the criteria, and consider gating with a tool-side check: require synap_search to have been called first.
Customer pings the same number after T2 resolves; gets T1 again
  • ROUTING is in-process memory. Use Redis with TTL. After a resolution, clear the routing entry so the next ticket starts at T1.
The handoff feels abrupt
  • The bridge message helps. Customize it per escalation reason (“connecting you with billing” vs “connecting you with engineering”).