Voice Concierge (Pipecat + ElevenLabs)

Status: In Development · Playground demo coming soon. The recipe below is complete and runnable today; only the hosted playground showcase is pending.

A phone-grade voice concierge built on Pipecat with ElevenLabs TTS and Deepgram STT. Synap is wired in as a frame processor so memory injection happens automatically before each LLM call, and the turn is ingested afterward, all within voice latency budgets.

Python-only recipe. Pipecat is a Python-native framework and does not have a TypeScript port. If you need TypeScript voice, build on LiveKit Agents instead. See Patterns → Voice Agent on LiveKit for the pattern (Python-only there too) or wrap your own STT/LLM/TTS on the JS side.

What you’ll build

A voice agent that:

Answers a phone call or live mic session
Recalls caller history mid-call from prior calls: preferences, prior issues, on-going situations
Stays inside voice latency budgets: Synap’s fast mode is built for the latency-critical retrieval path
Records and ingests every turn for next time
Handles natural-feeling phone interactions: interruptions, short replies, repeat handling

Est. build time: 60-90 minutes (most of it is STT/TTS provider setup).

When to use this recipe

Build this if:

You’re building a phone agent (inbound IVR replacement, outbound calling, kiosk voice UI)
Caller continuity across calls is the value: “I know who you are without you stating your account number”
You need sub-second total round-trip latency
You can carry Python on the call-handling side

Architecture at a glance

Synap sits on the LLM frame in Pipecat’s pipeline. The retrieval is on the critical path (must be fast). The ingestion is fire-and-forget (must not block the next utterance).

Stack

Layer	Choice
Synap SDK	`maximem-synap`
Synap adapter	`maximem-synap-pipecat`: frame processors for context inject and recording
Pipeline	Pipecat
STT	Deepgram (best latency / accuracy tradeoff for phone)
LLM	OpenAI `gpt-4o` (latency-tuned; use `gpt-4o-mini` if budget matters more than nuance)
TTS	ElevenLabs (Turbo v2.5: phone-quality, low first-byte latency)
Telephony	Twilio / Plivo / Pipecat’s WebRTC daily-co transport (your call)

Prerequisites

A Synap API key. See Authentication
Deepgram API key
ElevenLabs API key + chosen voice ID
OpenAI API key
A way to ingest audio (Twilio call → Pipecat WebRTC bridge is common)
Python 3.11+

Install

pip install maximem-synap maximem-synap-pipecat pipecat-ai \
  pipecat-ai[deepgram,openai,elevenlabs,silero]

Configure

# .env
SYNAP_API_KEY=...
OPENAI_API_KEY=...
DEEPGRAM_API_KEY=...
ELEVENLABS_API_KEY=...
ELEVENLABS_VOICE_ID=...

Build it

1. Identity & scoping

Voice has a stable identifier: the caller’s phone number (from CNAM / SIP From / Twilio webhook).

customer_id = a stable UUID for your business
user_id = a stable UUID derived from the caller phone (E.164; hash first if your privacy posture requires)
conversation_id = the same per-caller UUID, rolling: the relationship is the conversation; this isn’t per-call

Synap ids must be valid UUIDs, so don’t pass the raw phone number. Derive a deterministic UUID from it with uuid.uuid5(...) (shown below): the same phone always maps to the same id, which is exactly the rolling continuity you want.

2. The Pipecat pipeline

The maximem-synap-pipecat package exposes SynapContextHook and SynapMemoryHook as Pipecat frame processors. Drop them in around the LLM.

import os
import uuid
import asyncio
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.services.openai import OpenAILLMService
from pipecat.services.deepgram import DeepgramSTTService
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.transports.services.daily import DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer

from maximem_synap import MaximemSynapSDK
from synap_pipecat import SynapContextHook, SynapMemoryHook

# Synap ids must be valid UUIDs. Derive stable UUIDs deterministically from
# your business name and the caller's phone so the same caller always maps
# to the same memory identity.
CUSTOMER_ID = str(uuid.uuid5(uuid.NAMESPACE_DNS, "your-business"))

SYSTEM = """You are a phone concierge for <Business>.

- Use what you remember about the caller from prior calls: recent issues, preferences, ongoing situations.
- Voice rule: keep replies to 1-2 short sentences. Long replies feel wrong on the phone.
- If you need clarification, ask one focused question, not three.
- If the caller asks for a human, transfer immediately (transfer_to_agent tool)."""

async def run_call(caller_phone: str, room_url: str):
    sdk = MaximemSynapSDK()
    await sdk.initialize()

    # Stable per-caller UUID derived from the phone number.
    caller_uuid = str(uuid.uuid5(uuid.NAMESPACE_DNS, f"caller:{caller_phone}"))

    transport = DailyTransport(
        room_url, None, "Concierge",
        params={"audio_in_enabled": True, "audio_out_enabled": True,
                "vad_analyzer": SileroVADAnalyzer()},
    )
    stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
    llm = OpenAILLMService(api_key=os.environ["OPENAI_API_KEY"], model="gpt-4o")
    tts = ElevenLabsTTSService(
        api_key=os.environ["ELEVENLABS_API_KEY"],
        voice_id=os.environ["ELEVENLABS_VOICE_ID"],
        model="eleven_turbo_v2_5",
    )

    context_hook = SynapContextHook(
        sdk=sdk,
        user_id=caller_uuid,
        customer_id=CUSTOMER_ID,
        mode="fast",        # latency-critical
        max_results=5,
        system_template=(SYSTEM + "\n\nWhat we know about this caller:\n{context}"),
    )

    memory_hook = SynapMemoryHook(
        sdk=sdk,
        user_id=caller_uuid,
        customer_id=CUSTOMER_ID,
        conversation_id=caller_uuid,
        document_type="ai-chat-conversation",
        metadata={"channel": "voice"},
    )

    pipeline = Pipeline([
        transport.input(),
        stt,
        context_hook,    # injects caller memory into the LLM context
        llm,
        memory_hook,     # ingests the turn after the LLM responds
        tts,
        transport.output(),
    ])

    task = PipelineTask(pipeline)
    runner = PipelineRunner()
    await runner.run(task)

3. Tools (optional)

Phone agents lean on tools too: transfer to a human, look up an order, send an SMS follow-up. Pipecat’s OpenAILLMService supports function tools; wire them like you would in any agent.

async def transfer_to_agent(call_id: str, queue: str) -> dict:
    # Use Twilio / your telephony provider to bridge the call to a queue.
    return await telephony.transfer(call_id, queue)

async def send_sms_followup(phone: str, body: str) -> dict:
    return await sms.send(phone, body)

# Register with the OpenAILLMService...

4. Latency budget

Voice has a tight window of conversational comfort end-to-end. Where the time goes, fastest to slowest:

Stage	Relative cost
Deepgram STT (streaming, last word)	low
Synap context fetch (`fast` mode)	low
OpenAI LLM first-token	the dominant cost
ElevenLabs TTS first-byte (Turbo)	low-moderate

The expensive part is the LLM. Synap stays well inside the budget. If you see drift:

Move the SDK to the same region as your call-handling box.
Lower max_results to 3.
Cache the last context fetch for 10s: voice turns are tight in time.
Switch the LLM to gpt-4o-mini for short replies.

Run & verify

First call

Caller:    Hi, my order didn't come.
Concierge: I'm sorry. Looking up your account from this number… I see order #ORD-22,
           marked delivered yesterday. Is the address still 12 Oak Street?
Caller:    Yes but I never got it.
Concierge: Reissuing free shipping, on the way today. SMS confirmation in a sec. Anything else?
Caller:    No thanks.

Two weeks later

Caller:    Hey, calling about a different one.
Concierge: Welcome back. I see we reissued #ORD-22-R for you a couple weeks ago. Did
           that one arrive okay?
Caller:    Yeah it did, thanks. This is about a new order.
Concierge: Great. What's the order number?

The agent picked up the prior issue without you wiring anything case-specific. Phone calls feel continuous because they are.

Customize / extend

Outbound voice campaigns → flip the pipeline; have your telephony provider place outbound calls into the same Pipecat pipeline.
LiveKit instead of Pipecat → see Patterns → Voice Agent on LiveKit. Same memory model, different transport.
Voice journaling / personal companion → set max_results higher and use a richer system prompt; see AI Companion for the persona shape.
Coach by voice → adapt AI Coach over Pipecat. Voice tracking of workouts is natural (“I just ran 5k”).
Tier-2 escalation by voice → on transfer_to_agent, post a memory-grounded summary into your queue so the human agent has full context the moment they pick up.

Troubleshooting

Replies feel slow

Profile each stage. LLM first-token is usually the bottleneck. Reduce max_results, switch model, or pre-emit a filler (“let me check…”) if your TTS supports interruption-friendly start.

Concierge re-asks for the caller’s name every call

Caller-name memory isn’t being ingested. After the first call, capture name explicitly with a tool or system rule: “if the caller introduces themselves, store via Synap before responding.”

Context fetch times out

Run Synap in the same region. Check network. If degraded, fall back to no-context gracefully; the call should proceed without memory rather than fail.

Caller speaks over the agent and confuses the pipeline

Pipecat handles VAD-driven interruption; tune SileroVADAnalyzer sensitivity for your audio path.

Phone number not stable (caller ID withheld)

Fall back to a session-only ID + ask for an account number / OTP in-flow. Don’t write to long-term memory until identity is confirmed.

Integrations: Pipecat · LiveKit Agents
Concepts: Fast Mode · Customer Context · Conversational Context Lifecycle
Patterns: Voice Agent on LiveKit · Graceful Degradation
Other recipes: AI Companion · AI Coach

​What you’ll build

​When to use this recipe

​Architecture at a glance

​Stack

​Prerequisites

​Install

​Configure

​Build it

​1. Identity & scoping

​2. The Pipecat pipeline

​3. Tools (optional)

​4. Latency budget

​Run & verify

​Customize / extend

​Troubleshooting

​Related

What you’ll build

When to use this recipe

Architecture at a glance

Stack

Prerequisites

Install

Configure

Build it

1. Identity & scoping

2. The Pipecat pipeline

3. Tools (optional)

4. Latency budget

Run & verify

Customize / extend

Troubleshooting

Related