Voice Agent on LiveKit

LiveKit Agents handles the audio plumbing (STT → LLM → TTS). Synap handles memory. The synap-livekit-agents integration wires them together with two callback hooks.

# pip install livekit-agents openai maximem-synap synap-livekit-agents
from livekit.agents import (
    AutoSubscribe,
    JobContext,
    JobProcess,
    WorkerOptions,
    cli,
    llm,
)
from livekit.agents.voice_assistant import VoiceAssistant
from livekit.plugins import openai, silero, deepgram
from synap_livekit_agents import SynapMemoryHook, SynapContextProvider
from maximem_synap import MaximemSynapSDK

async def entrypoint(ctx: JobContext):
    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)
    participant = await ctx.wait_for_participant()

    # Caller identity — map LiveKit room metadata to your stable user_id.
    # In production, sign this in your token-issuer service; never trust client-supplied IDs.
    user_id = participant.metadata.get("synap_user_id") or f"anon_{participant.identity}"
    customer_id = participant.metadata.get("synap_customer_id") or "default"

    sdk = MaximemSynapSDK()
    await sdk.initialize()

    # SynapContextProvider injects relevant memories into the system prompt
    # before each LLM call. SynapMemoryHook ingests every user-assistant turn
    # back into Synap after the assistant speaks.
    context_provider = SynapContextProvider(
        sdk=sdk,
        user_id=user_id,
        customer_id=customer_id,
        mode="fast",          # latency-sensitive
        max_results=5,
    )
    memory_hook = SynapMemoryHook(
        sdk=sdk,
        user_id=user_id,
        customer_id=customer_id,
    )

    initial_ctx = llm.ChatContext().append(
        role="system",
        text=(
            "You are a friendly phone agent. Use any known facts about the caller from "
            "their memory context. Keep responses under 2 sentences for voice clarity."
        ),
    )

    assistant = VoiceAssistant(
        vad=silero.VAD.load(),
        stt=deepgram.STT(),
        llm=openai.LLM(model="gpt-4o"),
        tts=openai.TTS(),
        chat_ctx=initial_ctx,
        before_llm_cb=context_provider.before_llm,    # ← memory inject
        after_llm_cb=memory_hook.after_llm,           # ← memory ingest
    )

    assistant.start(ctx.room, participant)

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

The two hooks

before_llm fetches relevant memories using the most recent user utterance as the search_query, and prepends them to the chat context as a system message. Latency-tuned for mode="fast" so it doesn’t add perceptible delay.
after_llm ingests the latest user + assistant turn as a single document with document_type="ai-chat-conversation". Runs in the background; doesn’t block the next TTS.

Latency budget for voice Voice agents have ~300 ms of conversational comfort. Synap’s fast retrieval P95 is ~50–100 ms, comfortably within budget. If you see Synap calls pushing past 200 ms in your environment:

Move the SDK to the same region as your LiveKit egress.
Lower max_results to 3.
Cache get_context_for_prompt between turns when the user message is similar enough — the SDK does this for you automatically with a 5-minute TTL.

Privacy note for voice The full audio is processed by LiveKit + Deepgram. Synap only sees the text transcript that comes back from STT. If you’re under stricter privacy regimes, set the LiveKit recording flag to off and review the Security & Trust page.

Getting Started

Setup & Integration

SDK

Guides

Cookbook

Concepts

Dashboard

API Reference

Migration

Roadmap

Resources