> ## Documentation Index
> Fetch the complete documentation index at: https://docs.maximem.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Voice Concierge (Pipecat + ElevenLabs)

> Real-time phone agent that recalls caller history mid-call. STT → memory inject → LLM → TTS, all within conversational latency budgets.

<Info>
  **Status:** In Development · Playground demo coming soon.
  The recipe below is complete and runnable today; only the hosted playground showcase is pending.
</Info>

A phone-grade voice concierge built on Pipecat with ElevenLabs TTS and Deepgram STT. Synap is wired in as a frame processor so memory injection happens automatically before each LLM call, and the turn is ingested afterward, all within voice latency budgets.

<Note>
  **Python-only recipe.** Pipecat is a Python-native framework and does not have a TypeScript port. If you need TypeScript voice, build on [LiveKit Agents](/integrations/livekit-agents) instead. See [Patterns → Voice Agent on LiveKit](/patterns/voice-agent-livekit) for the pattern (Python-only there too) or wrap your own STT/LLM/TTS on the JS side.
</Note>

## What you'll build

A voice agent that:

* **Answers a phone call or live mic session**
* **Recalls caller history** mid-call from prior calls: preferences, prior issues, on-going situations
* **Stays inside voice latency budgets**: Synap's `fast` mode is built for the latency-critical retrieval path
* **Records and ingests every turn** for next time
* **Handles natural-feeling phone interactions**: interruptions, short replies, repeat handling

**Est. build time:** 60-90 minutes (most of it is STT/TTS provider setup).

## When to use this recipe

Build this if:

* You're building a phone agent (inbound IVR replacement, outbound calling, kiosk voice UI)
* Caller continuity across calls is the value: "I know who you are without you stating your account number"
* You need sub-second total round-trip latency
* You can carry Python on the call-handling side

## Architecture at a glance

```mermaid theme={null}
flowchart TD
    Caller[Caller audio] --> STT[Deepgram STT]
    STT --> Context[SynapContextHook<br/>fetch memory in fast mode]
    Context --> LLM[OpenAI LLM gpt-4o]
    LLM --> Memory[SynapMemoryHook<br/>ingest turn, background]
    Memory --> TTS[ElevenLabs TTS]
    TTS --> Out[Audio out to caller]
```

Synap sits on the LLM frame in Pipecat's pipeline. The retrieval is on the critical path (must be fast). The ingestion is fire-and-forget (must not block the next utterance).

## Stack

| Layer             | Choice                                                                                              |
| ----------------- | --------------------------------------------------------------------------------------------------- |
| **Synap SDK**     | `maximem-synap`                                                                                     |
| **Synap adapter** | [`maximem-synap-pipecat`](/integrations/pipecat): frame processors for context inject and recording |
| **Pipeline**      | Pipecat                                                                                             |
| **STT**           | Deepgram (best latency / accuracy tradeoff for phone)                                               |
| **LLM**           | OpenAI `gpt-4o` (latency-tuned; use `gpt-4o-mini` if budget matters more than nuance)               |
| **TTS**           | ElevenLabs (Turbo v2.5: phone-quality, low first-byte latency)                                      |
| **Telephony**     | Twilio / Plivo / Pipecat's WebRTC daily-co transport (your call)                                    |

## Prerequisites

* A Synap API key. See [Authentication](/setup/authentication)
* Deepgram API key
* ElevenLabs API key + chosen voice ID
* OpenAI API key
* A way to ingest audio (Twilio call → Pipecat WebRTC bridge is common)
* Python 3.11+

### Install

<CodeGroup>
  ```bash pip theme={null}
  pip install maximem-synap maximem-synap-pipecat pipecat-ai \
    pipecat-ai[deepgram,openai,elevenlabs,silero]
  ```

  ```bash uv theme={null}
  uv add maximem-synap maximem-synap-pipecat pipecat-ai \
    pipecat-ai[deepgram,openai,elevenlabs,silero]
  # pip-compatible (existing venv): uv pip install maximem-synap maximem-synap-pipecat pipecat-ai pipecat-ai[deepgram,openai,elevenlabs,silero]
  ```
</CodeGroup>

### Configure

```bash theme={null}
# .env
SYNAP_API_KEY=...
OPENAI_API_KEY=...
DEEPGRAM_API_KEY=...
ELEVENLABS_API_KEY=...
ELEVENLABS_VOICE_ID=...
```

## Build it

### 1. Identity & scoping

Voice has a stable identifier: the caller's phone number (from CNAM / SIP From / Twilio webhook).

* `customer_id` = a stable UUID for your business
* `user_id` = a stable UUID derived from the caller phone (E.164; hash first if your privacy posture requires)
* `conversation_id` = the same per-caller UUID, rolling: the relationship is the conversation; this isn't per-call

Synap ids must be valid UUIDs, so don't pass the raw phone number. Derive a deterministic UUID from it with `uuid.uuid5(...)` (shown below): the same phone always maps to the same id, which is exactly the rolling continuity you want.

### 2. The Pipecat pipeline

The `maximem-synap-pipecat` package exposes `SynapContextHook` and `SynapMemoryHook` as Pipecat frame processors. Drop them in around the LLM.

```python theme={null}
import os
import uuid
import asyncio
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.services.openai import OpenAILLMService
from pipecat.services.deepgram import DeepgramSTTService
from pipecat.services.elevenlabs import ElevenLabsTTSService
from pipecat.transports.services.daily import DailyTransport
from pipecat.vad.silero import SileroVADAnalyzer

from maximem_synap import MaximemSynapSDK
from synap_pipecat import SynapContextHook, SynapMemoryHook

# Synap ids must be valid UUIDs. Derive stable UUIDs deterministically from
# your business name and the caller's phone so the same caller always maps
# to the same memory identity.
CUSTOMER_ID = str(uuid.uuid5(uuid.NAMESPACE_DNS, "your-business"))

SYSTEM = """You are a phone concierge for <Business>.

- Use what you remember about the caller from prior calls: recent issues, preferences, ongoing situations.
- Voice rule: keep replies to 1-2 short sentences. Long replies feel wrong on the phone.
- If you need clarification, ask one focused question, not three.
- If the caller asks for a human, transfer immediately (transfer_to_agent tool)."""

async def run_call(caller_phone: str, room_url: str):
    sdk = MaximemSynapSDK()
    await sdk.initialize()

    # Stable per-caller UUID derived from the phone number.
    caller_uuid = str(uuid.uuid5(uuid.NAMESPACE_DNS, f"caller:{caller_phone}"))

    transport = DailyTransport(
        room_url, None, "Concierge",
        params={"audio_in_enabled": True, "audio_out_enabled": True,
                "vad_analyzer": SileroVADAnalyzer()},
    )
    stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])
    llm = OpenAILLMService(api_key=os.environ["OPENAI_API_KEY"], model="gpt-4o")
    tts = ElevenLabsTTSService(
        api_key=os.environ["ELEVENLABS_API_KEY"],
        voice_id=os.environ["ELEVENLABS_VOICE_ID"],
        model="eleven_turbo_v2_5",
    )

    context_hook = SynapContextHook(
        sdk=sdk,
        user_id=caller_uuid,
        customer_id=CUSTOMER_ID,
        mode="fast",        # latency-critical
        max_results=5,
        system_template=(SYSTEM + "\n\nWhat we know about this caller:\n{context}"),
    )

    memory_hook = SynapMemoryHook(
        sdk=sdk,
        user_id=caller_uuid,
        customer_id=CUSTOMER_ID,
        conversation_id=caller_uuid,
        document_type="ai-chat-conversation",
        metadata={"channel": "voice"},
    )

    pipeline = Pipeline([
        transport.input(),
        stt,
        context_hook,    # injects caller memory into the LLM context
        llm,
        memory_hook,     # ingests the turn after the LLM responds
        tts,
        transport.output(),
    ])

    task = PipelineTask(pipeline)
    runner = PipelineRunner()
    await runner.run(task)
```

### 3. Tools (optional)

Phone agents lean on tools too: transfer to a human, look up an order, send an SMS follow-up. Pipecat's `OpenAILLMService` supports function tools; wire them like you would in any agent.

```python theme={null}
async def transfer_to_agent(call_id: str, queue: str) -> dict:
    # Use Twilio / your telephony provider to bridge the call to a queue.
    return await telephony.transfer(call_id, queue)

async def send_sms_followup(phone: str, body: str) -> dict:
    return await sms.send(phone, body)

# Register with the OpenAILLMService...
```

### 4. Latency budget

Voice has a tight window of conversational comfort end-to-end. Where the time goes, fastest to slowest:

| Stage                               | Relative cost     |
| ----------------------------------- | ----------------- |
| Deepgram STT (streaming, last word) | low               |
| Synap context fetch (`fast` mode)   | low               |
| OpenAI LLM first-token              | the dominant cost |
| ElevenLabs TTS first-byte (Turbo)   | low-moderate      |

The expensive part is the LLM. Synap stays well inside the budget. If you see drift:

1. Move the SDK to the same region as your call-handling box.
2. Lower `max_results` to 3.
3. Cache the last context fetch for 10s: voice turns are tight in time.
4. Switch the LLM to `gpt-4o-mini` for short replies.

## Run & verify

```text First call theme={null}
Caller:    Hi, my order didn't come.
Concierge: I'm sorry. Looking up your account from this number… I see order #ORD-22,
           marked delivered yesterday. Is the address still 12 Oak Street?
Caller:    Yes but I never got it.
Concierge: Reissuing free shipping, on the way today. SMS confirmation in a sec. Anything else?
Caller:    No thanks.
```

```text Two weeks later theme={null}
Caller:    Hey, calling about a different one.
Concierge: Welcome back. I see we reissued #ORD-22-R for you a couple weeks ago. Did
           that one arrive okay?
Caller:    Yeah it did, thanks. This is about a new order.
Concierge: Great. What's the order number?
```

The agent picked up the prior issue without you wiring anything case-specific. Phone calls feel continuous because they are.

## Customize / extend

* **Outbound voice campaigns** → flip the pipeline; have your telephony provider place outbound calls into the same Pipecat pipeline.
* **LiveKit instead of Pipecat** → see [Patterns → Voice Agent on LiveKit](/patterns/voice-agent-livekit). Same memory model, different transport.
* **Voice journaling / personal companion** → set `max_results` higher and use a richer system prompt; see [AI Companion](/cookbook/personal-ai-companion) for the persona shape.
* **Coach by voice** → adapt [AI Coach](/cookbook/personal-ai-coach) over Pipecat. Voice tracking of workouts is natural ("I just ran 5k").
* **Tier-2 escalation by voice** → on `transfer_to_agent`, post a memory-grounded summary into your queue so the human agent has full context the moment they pick up.

## Troubleshooting

**Replies feel slow**

* Profile each stage. LLM first-token is usually the bottleneck. Reduce `max_results`, switch model, or pre-emit a filler ("let me check…") if your TTS supports interruption-friendly start.

**Concierge re-asks for the caller's name every call**

* Caller-name memory isn't being ingested. After the first call, capture name explicitly with a tool or system rule: "if the caller introduces themselves, store via Synap before responding."

**Context fetch times out**

* Run Synap in the same region. Check network. If degraded, fall back to no-context gracefully; the call should proceed without memory rather than fail.

**Caller speaks over the agent and confuses the pipeline**

* Pipecat handles VAD-driven interruption; tune `SileroVADAnalyzer` sensitivity for your audio path.

**Phone number not stable (caller ID withheld)**

* Fall back to a session-only ID + ask for an account number / OTP in-flow. Don't write to long-term memory until identity is confirmed.

## Related

* **Integrations:** [Pipecat](/integrations/pipecat) · [LiveKit Agents](/integrations/livekit-agents)
* **Concepts:** [Fast Mode](/concepts/retrieval-modes) · [Customer Context](/concepts/context-end-to-end#customer-context) · [Conversational Context Lifecycle](/concepts/context-end-to-end#short-term-context)
* **Patterns:** [Voice Agent on LiveKit](/patterns/voice-agent-livekit) · [Graceful Degradation](/patterns/graceful-degradation)
* **Other recipes:** [AI Companion](/cookbook/personal-ai-companion) · [AI Coach](/cookbook/personal-ai-coach)