Documentation Index
Fetch the complete documentation index at: https://docs.maximem.ai/llms.txt
Use this file to discover all available pages before exploring further.
LiveKit Agents handles the audio plumbing (STT → LLM → TTS). Synap handles memory. The synap-livekit-agents integration wires them together with two callback hooks.
# pip install livekit-agents openai maximem-synap synap-livekit-agents
from livekit.agents import (
AutoSubscribe,
JobContext,
JobProcess,
WorkerOptions,
cli,
llm,
)
from livekit.agents.voice_assistant import VoiceAssistant
from livekit.plugins import openai, silero, deepgram
from synap_livekit_agents import SynapMemoryHook, SynapContextProvider
from maximem_synap import MaximemSynapSDK
async def entrypoint(ctx: JobContext):
await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)
participant = await ctx.wait_for_participant()
# Caller identity — map LiveKit room metadata to your stable user_id.
# In production, sign this in your token-issuer service; never trust client-supplied IDs.
user_id = participant.metadata.get("synap_user_id") or f"anon_{participant.identity}"
customer_id = participant.metadata.get("synap_customer_id") or "default"
sdk = MaximemSynapSDK()
await sdk.initialize()
# SynapContextProvider injects relevant memories into the system prompt
# before each LLM call. SynapMemoryHook ingests every user-assistant turn
# back into Synap after the assistant speaks.
context_provider = SynapContextProvider(
sdk=sdk,
user_id=user_id,
customer_id=customer_id,
mode="fast", # latency-sensitive
max_results=5,
)
memory_hook = SynapMemoryHook(
sdk=sdk,
user_id=user_id,
customer_id=customer_id,
)
initial_ctx = llm.ChatContext().append(
role="system",
text=(
"You are a friendly phone agent. Use any known facts about the caller from "
"their memory context. Keep responses under 2 sentences for voice clarity."
),
)
assistant = VoiceAssistant(
vad=silero.VAD.load(),
stt=deepgram.STT(),
llm=openai.LLM(model="gpt-4o"),
tts=openai.TTS(),
chat_ctx=initial_ctx,
before_llm_cb=context_provider.before_llm, # ← memory inject
after_llm_cb=memory_hook.after_llm, # ← memory ingest
)
assistant.start(ctx.room, participant)
if __name__ == "__main__":
cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))
The two hooks
before_llm fetches relevant memories using the most recent user utterance as the search_query, and prepends them to the chat context as a system message. Latency-tuned for mode="fast" so it doesn’t add perceptible delay.
after_llm ingests the latest user + assistant turn as a single document with document_type="ai-chat-conversation". Runs in the background; doesn’t block the next TTS.
Latency budget for voice
Voice agents have ~300 ms of conversational comfort. Synap’s fast retrieval P95 is ~50–100 ms, comfortably within budget. If you see Synap calls pushing past 200 ms in your environment:
- Move the SDK to the same region as your LiveKit egress.
- Lower
max_results to 3.
- Cache
get_context_for_prompt between turns when the user message is similar enough — the SDK does this for you automatically with a 5-minute TTL.
Privacy note for voice
The full audio is processed by LiveKit + Deepgram. Synap only sees the text transcript that comes back from STT. If you’re under stricter privacy regimes, set the LiveKit recording flag to off and review the Security & Trust page.