Prerequisites
Before you begin, make sure you have:- Python 3.10+ installed on your machine
- A Synap account with access to the Dashboard
- An instance created in the Dashboard (see Quickstart if you haven’t done this yet). For best results, upload a Use-Case Markdown file when creating your instance — see Use-Case Markdown for the template and authoring guide.
- An OpenAI API key (or any LLM provider — we use OpenAI in this tutorial for simplicity)
This tutorial assumes basic familiarity with Python async/await and REST APIs. If you are new to async Python, check out the asyncio documentation first.
Set Up Your Project
Create a new directory for your project and install the required dependencies:Install the SDK and supporting libraries:Your project will have the following structure:
Configure Your Environment
Create a
.env file with your credentials. You will need three values:- SYNAP_INSTANCE_ID: The instance ID from your Dashboard (e.g.,
inst_a1b2c3d4e5f67890) - SYNAP_BOOTSTRAP_TOKEN: The bootstrap key generated for your instance
- OPENAI_API_KEY: Your OpenAI API key
.env
Initialize the SDK
Create a Key points about this setup:
startup.py module that manages the SDK lifecycle. This module will be imported by your FastAPI application.startup.py
cache_backend="sqlite"enables local caching for faster repeated retrievals.log_level="INFO"is appropriate for development. Switch to"WARNING"or"ERROR"in production.- The
sdkobject is a module-level singleton. Import it from any module and it will reference the same initialized instance.
Create the FastAPI Application with Lifespan
Now create the
main.py file. Start with the application lifespan manager, which ensures the SDK initializes on startup and shuts down cleanly when the server stops.main.py
The
lifespan context manager is the recommended way to manage startup/shutdown in modern FastAPI applications (v0.95+). It replaces the older @app.on_event("startup") and @app.on_event("shutdown") hooks.Build the Chat Endpoint
Add the chat endpoint to Let’s break down each step:
main.py. This endpoint performs four operations in sequence:- Retrieve relevant memories from Synap
- Build a system prompt enriched with memory context
- Call the LLM with the enriched prompt
- Ingest the conversation turn back into Synap for future memory
main.py (continued)
Step 1: Memory Retrieval
Step 1: Memory Retrieval
The
sdk.conversation.context.fetch() call searches Synap’s vector and graph stores for memories relevant to the user’s message. Key parameters:search_query: A list of strings used for semantic search. Passing the user’s message ensures we find contextually relevant memories.max_results=5: Limits context to the top 5 most relevant memories, keeping the prompt concise.types=["facts", "preferences"]: Retrieves only facts and preferences. Other types includetemporal_events,relationships, andentities.mode="fast": Uses the fast retrieval path (~50-100ms). Use"accurate"(~200-500ms) when precision matters more than latency.
Step 2: Prompt Construction
Step 2: Prompt Construction
The retrieved memories are formatted as bullet points and injected into the system prompt. This gives the LLM access to user-specific context without modifying the conversation history.The confidence score (e.g.,
92%) is included to help the LLM weigh how certain each piece of information is. You can omit confidence scores if you prefer a cleaner prompt.Step 3: LLM Call
Step 3: LLM Call
A standard OpenAI chat completion call. The system prompt now contains personalized context, so the LLM can respond as if it “remembers” the user. This works with any LLM provider — replace the OpenAI call with your preferred provider.
Step 4: Memory Ingestion
Step 4: Memory Ingestion
After generating a response, the full conversation turn is sent back to Synap. The ingestion pipeline automatically extracts structured knowledge (facts, preferences, entities, temporal events) from the text.
document_type="ai-chat-conversation"tells the pipeline to expect a conversational format withUser:andAssistant:turns.mode="fast"processes asynchronously without blocking the response.metadataattaches arbitrary key-value pairs for later filtering and audit.
Add a Health Check Endpoint
Good practice for production deployments — add a health check that verifies the SDK is connected:
main.py (continued)
Run and Test
Load your environment variables and start the server:You should see output confirming the SDK has initialized:Now test with a few conversation turns:Notice that
memories_used increases as Synap accumulates knowledge about the user. The assistant naturally references the Japan trip even though it was mentioned in a different conversation.Verify in the Dashboard
Open the Synap Dashboard and navigate to your instance. You should see:
- API call counts reflecting your test requests
- Memory counts showing extracted facts, preferences, and entities
- Ingestion history with the conversation turns you sent

Complete Code
Here is the final version of both files for reference:What’s Next?
You have a working memory-enabled chatbot. Here are the natural next steps to make it production-ready:Configuring Memory
Tune what gets extracted, how it is stored, and how retrieval ranking works with MACA configuration.
Multi-User Scoping
Set up memory isolation for multi-tenant applications with user, customer, and client scopes.
Context Compaction
Manage long conversations by compacting context to fit within your LLM’s token budget.
Production Checklist
Security, performance, and monitoring best practices before going live.