This is the canonical end-to-end tutorial: read it top-to-bottom. It assumes you’ve finished the Quickstart and want to wire Synap into a real application.If you only need a snippet for a specific framework (Flask, Next.js, Django) or LLM provider (Anthropic, Vercel AI SDK), open Setup → Integration and copy the relevant tab instead.Prefer to explore the SDK in a browser before writing code? Use the live playground.
Prerequisites
Before you begin, make sure you have:- Python 3.11+ installed on your machine
- A Synap account with access to the Dashboard
- An instance created in the Dashboard (see Quickstart if you haven’t done this yet). For best results, upload a Use-Case Markdown file when creating your instance; see Use-Case Markdown for the template and authoring guide.
- An OpenAI API key (or any LLM provider; we use OpenAI in this tutorial for simplicity). No paid key yet? Use Google Gemini’s free tier; see the Gemini snippet in Setup → Integration.
This tutorial assumes basic familiarity with Python async/await. If you are new to async Python, check out the asyncio documentation first.
Set Up Your Project
Create a new directory for your project and install the required dependencies:Install the SDK and supporting libraries:Your project will have the following structure:
Configure Your Environment
Create a
.env file with your credentials. You will need two values:- SYNAP_API_KEY: The API key generated for your instance (format:
synap_<random>). Generate one from the Dashboard: open your instance, click Generate API Key, and copy the key; it is shown only once. - OPENAI_API_KEY: Your OpenAI API key
.env
Initialize the SDK
Create a Key points about this setup:
startup.py module that manages the SDK lifecycle. This module will be imported by your FastAPI application.startup.py
cache_backend="sqlite"enables local caching for faster repeated retrievals.log_level="INFO"is appropriate for development. Switch to"WARNING"or"ERROR"in production.- The
sdkobject is a module-level singleton. Import it from any module and it will reference the same initialized instance.
Create the FastAPI Application with Lifespan
Now create the
main.py file. Start with the application lifespan manager, which ensures the SDK initializes on startup and shuts down cleanly when the server stops.main.py
The
lifespan context manager is the recommended way to manage startup/shutdown in modern FastAPI applications (v0.95+). It replaces the older @app.on_event("startup") and @app.on_event("shutdown") hooks.Build the Chat Endpoint
Add the chat endpoint to Let’s break down each step:
main.py. This endpoint performs five operations in sequence:- Record the incoming user message: this registers the conversation so context can be fetched back
- Retrieve relevant memories from Synap
- Build a system prompt enriched with memory context
- Call the LLM with the enriched prompt
- Record + Ingest the conversation turn back into Synap for future memory
Registering each turn with
record_message is what makes the conversation
retrievable. Context fetched by conversation_id only returns turns that were
recorded; passing conversation_id solely as metadata on memories.create
does not register the conversation (metadata is stored, not indexed for
conversation-scoped retrieval). An unregistered conversation returns empty
results by design.main.py (continued)
Step 1: Record the User Message
Step 1: Record the User Message
sdk.conversation.record_message() appends the turn to the conversation’s
rolling history and registers the conversation under this
conversation_id. This registration is what later lets
conversation.context.fetch(conversation_id=...) resolve scope and return
the conversation’s turns. Skip it and the first fetch for a brand-new
conversation_id comes back empty, by design.Step 2: Memory Retrieval
Step 2: Memory Retrieval
The
sdk.conversation.context.fetch() call searches Synap’s vector and graph stores for memories relevant to the user’s message. Key parameters:search_query: A list of strings used for semantic search. Passing the user’s message ensures we find contextually relevant memories.max_results=5: Limits context to the top 5 most relevant memories, keeping the prompt concise.types=["facts", "preferences"]: Retrieves only facts and preferences. Other types includeepisodes,emotions, andtemporal. Useallto retrieve every type.mode="fast": Uses thefastretrieval path (lower latency). Useaccuratewhen precision matters more; accurate adds LLM subquery decomposition + reranking on top of the same vector + graph search.
Step 3: Prompt Construction
Step 3: Prompt Construction
The retrieved memories are formatted as bullet points and injected into the system prompt. This gives the LLM access to user-specific context without modifying the conversation history.The confidence score (e.g.,
92%) is included to help the LLM weigh how certain each piece of information is. You can omit confidence scores if you prefer a cleaner prompt.Step 4: LLM Call
Step 4: LLM Call
A standard OpenAI chat completion call. The system prompt now contains personalized context, so the LLM can respond as if it “remembers” the user. This works with any LLM provider: replace the OpenAI call with your preferred provider.
Step 5: Record Reply + Memory Ingestion
Step 5: Record Reply + Memory Ingestion
Two distinct writes happen here, and they serve different purposes:
record_message(role="assistant", ...)completes the turn in the conversation’s rolling history, so the next turn’scontext.fetchsees the full exchange. This is the call that makes conversation-scoped retrieval work.memories.create(...)runs the ingestion pipeline, which extracts structured long-term knowledge (facts, preferences, entities, temporal events) from the text. Itsmetadata(includingconversation_id) is stored for filtering and audit but is not what registers the conversation; that isrecord_message’s job.
memories.create:document_type="ai-chat-conversation"tells the pipeline to expect a conversational format withUser:andAssistant:turns.mode="fast"processes asynchronously without blocking the response.metadataattaches arbitrary key-value pairs for later filtering and audit.
Add a Health Check Endpoint
Good practice for production deployments: add a health check that verifies the SDK is connected:
main.py (continued)
Run and Test
Load your environment variables and start the server:You should see output confirming the SDK has initialized:Now test with a few conversation turns:Notice that
memories_used increases as Synap accumulates knowledge about the user. Because the first turn was registered with record_message (and ingested with memories.create), the second turn’s context.fetch finds the earlier exchange and the assistant naturally references the Japan trip. Had the conversation never been recorded, that second fetch would return empty and memories_used would stay 0.Verify in the Dashboard
Open the Synap Dashboard and navigate to your instance. You should see:
- API call counts reflecting your test requests
- Memory counts showing extracted facts, preferences, and entities
- Ingestion history with the conversation turns you sent

Complete Code
Here is the final version of both files for reference:What’s Next?
You have a working memory-enabled chatbot. Here are the natural next steps to make it production-ready:Writing a Use-Case Markdown File
The use-case file is how Synap tunes what gets extracted, how it is stored, and how retrieval ranking works for your Instance.
Multi-User Scoping
Set up memory isolation for multi-tenant applications with user, customer, and client scopes.
Context Compaction
Manage long conversations by compacting context to fit within your LLM’s token budget.
Production Checklist
Security, performance, and monitoring best practices before going live.