Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.maximem.ai/llms.txt

Use this file to discover all available pages before exploring further.

Status: In Development · Playground demo coming soon. The recipe below is complete and runnable today — only the hosted playground showcase is pending.
A phone-grade voice concierge built on Pipecat with ElevenLabs TTS and Deepgram STT. Synap is wired in as a frame processor so memory injection happens automatically before each LLM call, and the turn is ingested afterward — all within voice latency budgets.
Python-only recipe. Pipecat is a Python-native framework and does not have a TypeScript port. If you need TypeScript voice, build on LiveKit Agents instead — see Patterns → Voice Agent on LiveKit for the pattern (Python-only there too) or wrap your own STT/LLM/TTS on the JS side.

What you’ll build

A voice agent that:
  • Answers a phone call or live mic session
  • Recalls caller history mid-call from prior calls — preferences, prior issues, on-going situations
  • Stays inside voice latency budgets — Synap’s fast mode is sub-100ms P95
  • Records and ingests every turn for next time
  • Handles natural-feeling phone interactions — interruptions, short replies, repeat handling
Est. build time: 60–90 minutes (most of it is STT/TTS provider setup).

When to use this recipe

Build this if:
  • You’re building a phone agent (inbound IVR replacement, outbound calling, kiosk voice UI)
  • Caller continuity across calls is the value — “I know who you are without you stating your account number”
  • You need sub-second total round-trip latency
  • You can carry Python on the call-handling side

Architecture at a glance

Synap sits on the LLM frame in Pipecat’s pipeline. The retrieval is on the critical path (must be fast). The ingestion is fire-and-forget (must not block the next utterance).

Stack

LayerChoice
Synap SDKmaximem-synap
Synap adaptermaximem-synap-pipecat — frame processors for context inject and recording
PipelinePipecat
STTDeepgram (best latency / accuracy tradeoff for phone)
LLMOpenAI gpt-4o (latency-tuned; use gpt-4o-mini if budget matters more than nuance)
TTSElevenLabs (Turbo v2.5 — phone-quality, sub-300ms first byte)
TelephonyTwilio / Plivo / Pipecat’s WebRTC daily-co transport — your call

Prerequisites

  • A Synap API key — see Authentication
  • Deepgram API key
  • ElevenLabs API key + chosen voice ID
  • OpenAI API key
  • A way to ingest audio (Twilio call → Pipecat WebRTC bridge is common)
  • Python 3.11+

Install

pip install maximem-synap maximem-synap-pipecat pipecat-ai \
  pipecat-ai[deepgram,openai,elevenlabs,silero]

Configure

# .env
SYNAP_API_KEY=...
SYNAP_SERVER_URL=<maximem-server>
OPENAI_API_KEY=...
DEEPGRAM_API_KEY=...
ELEVENLABS_API_KEY=...
ELEVENLABS_VOICE_ID=...

Build it

1. Identity & scoping

Voice has a stable identifier: the caller’s phone number (from CNAM / SIP From / Twilio webhook).
  • customer_id = "<your-business>"
  • user_id = <caller phone> (E.164; hash if your privacy posture requires)
  • conversation_id = <one per phone, rolling> — the relationship is the conversation; this isn’t per-call

2. The Pipecat pipeline

The maximem-synap-pipecat package exposes SynapContextHook and SynapMemoryHook as Pipecat frame processors. Drop them in around the LLM.
import os
import asyncio
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.services.openai import OpenAILLMService
from pipecat.services.deepgram import DeepgramSTTService
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.transports.services.daily import DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer

from maximem_synap import MaximemSynapSDK
from synap_pipecat import SynapContextHook, SynapMemoryHook

CUSTOMER_ID = "your-business"

SYSTEM = """You are a phone concierge for <Business>.

- Use what you remember about the caller from prior calls — recent issues, preferences, ongoing situations.
- Voice rule: keep replies to 1–2 short sentences. Long replies feel wrong on the phone.
- If you need clarification, ask one focused question, not three.
- If the caller asks for a human, transfer immediately (transfer_to_agent tool)."""

async def run_call(caller_phone: str, room_url: str):
    sdk = MaximemSynapSDK()
    await sdk.initialize()

    transport = DailyTransport(
        room_url, None, "Concierge",
        params={"audio_in_enabled": True, "audio_out_enabled": True,
                "vad_analyzer": SileroVADAnalyzer()},
    )
    stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
    llm = OpenAILLMService(api_key=os.environ["OPENAI_API_KEY"], model="gpt-4o")
    tts = ElevenLabsTTSService(
        api_key=os.environ["ELEVENLABS_API_KEY"],
        voice_id=os.environ["ELEVENLABS_VOICE_ID"],
        model="eleven_turbo_v2_5",
    )

    context_hook = SynapContextHook(
        sdk=sdk,
        user_id=caller_phone,
        customer_id=CUSTOMER_ID,
        mode="fast",        # latency-critical
        max_results=5,
        system_template=(SYSTEM + "\n\nWhat we know about this caller:\n{context}"),
    )

    memory_hook = SynapMemoryHook(
        sdk=sdk,
        user_id=caller_phone,
        customer_id=CUSTOMER_ID,
        conversation_id=caller_phone,
        document_type="ai-chat-conversation",
        metadata={"channel": "voice"},
    )

    pipeline = Pipeline([
        transport.input(),
        stt,
        context_hook,    # injects caller memory into the LLM context
        llm,
        memory_hook,     # ingests the turn after the LLM responds
        tts,
        transport.output(),
    ])

    task = PipelineTask(pipeline)
    runner = PipelineRunner()
    await runner.run(task)

3. Tools (optional)

Phone agents lean on tools too — transfer to a human, look up an order, send an SMS follow-up. Pipecat’s OpenAILLMService supports function tools; wire them like you would in any agent.
async def transfer_to_agent(call_id: str, queue: str) -> dict:
    # Use Twilio / your telephony provider to bridge the call to a queue.
    return await telephony.transfer(call_id, queue)

async def send_sms_followup(phone: str, body: str) -> dict:
    return await sms.send(phone, body)

# Register with the OpenAILLMService...

4. Latency budget

Voice has ~300ms of conversational comfort end-to-end. Where it goes:
StageTypical
Deepgram STT (streaming, last word)~50ms
Synap context fetch (fast mode, P95)~50–100ms
OpenAI LLM first-token~200–400ms
ElevenLabs TTS first-byte (Turbo)~100–200ms
The expensive part is the LLM. Synap stays in budget. If you see drift:
  1. Move the SDK to the same region as your call-handling box.
  2. Lower max_results to 3.
  3. Cache the last context fetch for 10s — voice turns are tight in time.
  4. Switch the LLM to gpt-4o-mini for short replies.

Run & verify

First call
Caller:    Hi, my order didn't come.
Concierge: I'm sorry — looking up your account from this number… I see order #ORD-22,
           marked delivered yesterday. Is the address still 12 Oak Street?
Caller:    Yes but I never got it.
Concierge: Reissuing free shipping, on the way today. SMS confirmation in a sec. Anything else?
Caller:    No thanks.
Two weeks later
Caller:    Hey, calling about a different one.
Concierge: Welcome back. I see we reissued #ORD-22-R for you a couple weeks ago — did
           that one arrive okay?
Caller:    Yeah it did, thanks. This is about a new order.
Concierge: Great. What's the order number?
The agent picked up the prior issue without you wiring anything case-specific. Phone calls feel continuous because they are.

Customize / extend

  • Outbound voice campaigns → flip the pipeline; have your telephony provider place outbound calls into the same Pipecat pipeline.
  • LiveKit instead of Pipecat → see Patterns → Voice Agent on LiveKit. Same memory model, different transport.
  • Voice journaling / personal companion → set max_results higher and use a richer system prompt; see AI Companion for the persona shape.
  • Coach by voice → adapt AI Coach over Pipecat. Voice tracking of workouts is natural (“I just ran 5k”).
  • Tier-2 escalation by voice → on transfer_to_agent, post a memory-grounded summary into your queue so the human agent has full context the moment they pick up.

Troubleshooting

Replies feel slow
  • Profile each stage. LLM first-token is usually the bottleneck. Reduce max_results, switch model, or pre-emit a filler (“let me check…”) if your TTS supports interruption-friendly start.
Concierge re-asks for the caller’s name every call
  • Caller-name memory isn’t being ingested. After the first call, capture name explicitly with a tool or system rule: “if the caller introduces themselves, store via Synap before responding.”
Context fetch times out
  • Run Synap in the same region. Check network. If degraded, fall back to no-context gracefully — the call should proceed without memory rather than fail.
Caller speaks over the agent and confuses the pipeline
  • Pipecat handles VAD-driven interruption; tune SileroVADAnalyzer sensitivity for your audio path.
Phone number not stable (caller ID withheld)
  • Fall back to a session-only ID + ask for an account number / OTP in-flow. Don’t write to long-term memory until identity is confirmed.