Documentation Index
Fetch the complete documentation index at: https://docs.maximem.ai/llms.txt
Use this file to discover all available pages before exploring further.
Status: In Development · Playground demo coming soon.
The recipe below is complete and runnable today — only the hosted playground showcase is pending.
Python-only recipe. Pipecat is a Python-native framework and does not have a TypeScript port. If you need TypeScript voice, build on LiveKit Agents instead — see Patterns → Voice Agent on LiveKit for the pattern (Python-only there too) or wrap your own STT/LLM/TTS on the JS side.
What you’ll build
A voice agent that:- Answers a phone call or live mic session
- Recalls caller history mid-call from prior calls — preferences, prior issues, on-going situations
- Stays inside voice latency budgets — Synap’s
fastmode is sub-100ms P95 - Records and ingests every turn for next time
- Handles natural-feeling phone interactions — interruptions, short replies, repeat handling
When to use this recipe
Build this if:- You’re building a phone agent (inbound IVR replacement, outbound calling, kiosk voice UI)
- Caller continuity across calls is the value — “I know who you are without you stating your account number”
- You need sub-second total round-trip latency
- You can carry Python on the call-handling side
Architecture at a glance
Synap sits on the LLM frame in Pipecat’s pipeline. The retrieval is on the critical path (must be fast). The ingestion is fire-and-forget (must not block the next utterance).Stack
| Layer | Choice |
|---|---|
| Synap SDK | maximem-synap |
| Synap adapter | maximem-synap-pipecat — frame processors for context inject and recording |
| Pipeline | Pipecat |
| STT | Deepgram (best latency / accuracy tradeoff for phone) |
| LLM | OpenAI gpt-4o (latency-tuned; use gpt-4o-mini if budget matters more than nuance) |
| TTS | ElevenLabs (Turbo v2.5 — phone-quality, sub-300ms first byte) |
| Telephony | Twilio / Plivo / Pipecat’s WebRTC daily-co transport — your call |
Prerequisites
- A Synap API key — see Authentication
- Deepgram API key
- ElevenLabs API key + chosen voice ID
- OpenAI API key
- A way to ingest audio (Twilio call → Pipecat WebRTC bridge is common)
- Python 3.11+
Install
Configure
Build it
1. Identity & scoping
Voice has a stable identifier: the caller’s phone number (from CNAM / SIP From / Twilio webhook).customer_id = "<your-business>"user_id = <caller phone>(E.164; hash if your privacy posture requires)conversation_id = <one per phone, rolling>— the relationship is the conversation; this isn’t per-call
2. The Pipecat pipeline
Themaximem-synap-pipecat package exposes SynapContextHook and SynapMemoryHook as Pipecat frame processors. Drop them in around the LLM.
3. Tools (optional)
Phone agents lean on tools too — transfer to a human, look up an order, send an SMS follow-up. Pipecat’sOpenAILLMService supports function tools; wire them like you would in any agent.
4. Latency budget
Voice has ~300ms of conversational comfort end-to-end. Where it goes:| Stage | Typical |
|---|---|
| Deepgram STT (streaming, last word) | ~50ms |
Synap context fetch (fast mode, P95) | ~50–100ms |
| OpenAI LLM first-token | ~200–400ms |
| ElevenLabs TTS first-byte (Turbo) | ~100–200ms |
- Move the SDK to the same region as your call-handling box.
- Lower
max_resultsto 3. - Cache the last context fetch for 10s — voice turns are tight in time.
- Switch the LLM to
gpt-4o-minifor short replies.
Run & verify
First call
Two weeks later
Customize / extend
- Outbound voice campaigns → flip the pipeline; have your telephony provider place outbound calls into the same Pipecat pipeline.
- LiveKit instead of Pipecat → see Patterns → Voice Agent on LiveKit. Same memory model, different transport.
- Voice journaling / personal companion → set
max_resultshigher and use a richer system prompt; see AI Companion for the persona shape. - Coach by voice → adapt AI Coach over Pipecat. Voice tracking of workouts is natural (“I just ran 5k”).
- Tier-2 escalation by voice → on
transfer_to_agent, post a memory-grounded summary into your queue so the human agent has full context the moment they pick up.
Troubleshooting
Replies feel slow- Profile each stage. LLM first-token is usually the bottleneck. Reduce
max_results, switch model, or pre-emit a filler (“let me check…”) if your TTS supports interruption-friendly start.
- Caller-name memory isn’t being ingested. After the first call, capture name explicitly with a tool or system rule: “if the caller introduces themselves, store via Synap before responding.”
- Run Synap in the same region. Check network. If degraded, fall back to no-context gracefully — the call should proceed without memory rather than fail.
- Pipecat handles VAD-driven interruption; tune
SileroVADAnalyzersensitivity for your audio path.
- Fall back to a session-only ID + ask for an account number / OTP in-flow. Don’t write to long-term memory until identity is confirmed.
Related
- Integrations: Pipecat · LiveKit Agents
- Concepts: Fast Mode · Customer Context · Conversational Context Lifecycle
- Patterns: Voice Agent on LiveKit · Graceful Degradation
- Other recipes: AI Companion · AI Coach