Ollama Setup
Use tapes with Ollama to capture local LLM conversations, enable semantic search, and build transparent telemetry for your agent sessions.
This guide shows how to use tapes with Ollama for local model interactions. It assumes you've already installed tapes and have Ollama installed.
Video Walkthrough
See tapes and Ollama in action with this demonstration:
What You Get
Running tapes with Ollama captures every conversation turn in a searchable format:
Content-Addressable Storage
Each message and response gets a unique hash, creating checkpoints you can return to:
- Resume context - Check out any point in a conversation and continue from there
- Branch conversations - Try different approaches from the same starting point
- Audit interactions - See exactly what was said at each step
Semantic Search
Search conversations by meaning, not keywords:
- Find past work - Locate relevant conversations even if you don't remember exact wording
- Vector embeddings - Uses nomic-embed-text to understand semantic similarity
- MCP integration - Agents can search their own history
Transparent Telemetry
Complete visibility into agent sessions:
- Session history - Full record of all requests and responses
- Agent memory - Agents can reference previous interactions
- Debugging context - Understand what the model saw and why it responded
Start Ollama
Open a terminal and start the Ollama inference server:
ollama serve This starts Ollama on http://localhost:11434 by default.
Pull Required Models
Download the models you want to use. For example, to use Gemma 2:
ollama pull gemma2 For semantic search, pull the embedding model:
ollama pull nomic-embed-text Keep the Ollama terminal running. Open a new terminal for the next steps.
Start tapes
In a new terminal, start tapes configured for Ollama:
tapes serve --sqlite "./tapes.sqlite" This starts:
- Proxy server on
http://localhost:8080— captures Ollama traffic - API server on
http://localhost:8081— query stored conversations - SQLite storage in
./tapes.sqlitewith SQL vec for vector search
By default, tapes targets Ollama at http://localhost:11434. Keep this terminal running.
Interactive Chat
tapes includes an experimental chat client for quick testing. In a new terminal:
tapes chat Select a model when prompted (e.g., gemma2) and start chatting. Every message is captured and stored.
Example Session
$ tapes chat
Starting new conversation
> hello, how are you?
I'm doing well, thanks for asking!
> where is New York?
New York is a state in the northeastern United States... Each conversation turn receives a unique hash for later reference.
Search Your History
Once you've had a few conversations, search through them semantically:
tapes search "New York" Results show:
- Score - Similarity score (higher = more relevant)
- Hash - Unique identifier for that conversation turn
- Preview - Snippet of the matched content
Search uses semantic embeddings, not keyword matching. "where is NYC" will match "New York" discussions.
Resume from Checkpoints
Copy a hash from search results and resume the conversation from that exact point:
tapes checkout abc123def456... Now when you run tapes chat, it resumes from that checkpoint with full context:
$ tapes chat
Resuming from checkpoint abc123def456...
3 messages loaded
> what was the last message I sent?
You asked "where is New York?"
> View Current Checkout
Check which conversation point you're at:
tapes status Clear Checkout
Return to a fresh conversation state:
tapes checkout MCP Integration
The tapes API server includes an MCP endpoint for agent integration. This lets agents search their own conversation history.
Inspect MCP Server
Use the MCP Inspector to test the search tool:
npx @modelcontextprotocol/inspector http://localhost:8081/v1/mcp The search tool accepts natural language queries and returns relevant conversation segments with full context.
See MCP documentation for integration details with Claude Code and other agents.
Enable Vector Search
For production use with semantic search, start tapes with vector storage enabled.
Start Chroma
Launch the Chroma vector database:
docker run -p 8000:8000 chromadb/chroma Start tapes with Vector Storage
Configure tapes to use Chroma and Ollama embeddings:
tapes serve \
--sqlite "./tapes.sqlite" \
--vector-store-provider chroma \
--vector-store-target "http://localhost:8000" \
--embedding-provider ollama \
--embedding-target "http://localhost:11434" \
--embedding-model nomic-embed-text Now all conversations are automatically embedded for semantic search.
Without these flags, tapes uses SQL vec for vector storage (included in SQLite).
Query the API
Access stored conversations programmatically:
View Statistics
curl http://localhost:8081/v1/stats Returns session count, turn count, and root count.
List Sessions
curl http://localhost:8081/v1/sessions Returns paginated sessions with metadata.
Inspect a Session
curl http://localhost:8081/v1/sessions/abc123def456... Returns the full session chain with all turns.
Search via API
curl "http://localhost:8081/v1/search?q=New%20York" Returns semantically similar conversations.
See API reference for the full endpoint list and the Scalar UI at /swagger.
Troubleshooting
Ollama Connection Failed
Error: tapes can't connect to Ollama
Solutions:
- Verify Ollama is running:
curl http://localhost:11434 - Check Ollama is on default port (11434)
- If using custom port, add
--upstream http://localhost:PORT
Model Not Available
Error: Selected model doesn't exist
Solution:
- List available models:
ollama list - Pull the model:
ollama pull model-name
Search Returns No Results
Issue: tapes search shows no matches
Solutions:
- Verify embedding model is pulled:
ollama list | grep nomic-embed-text - Check conversations exist:
curl http://localhost:8081/v1/stats - Ensure you've had at least one conversation through the proxy
No Conversations Stored
Issue: API shows 0 nodes despite using chat
Solutions:
- Check SQLite file exists:
ls -la tapes.sqlite - Verify you started tapes with
--sqlite "./tapes.sqlite"flag - Check terminal logs for storage errors
Verify Setup
Check Ollama is responding:
curl http://localhost:11434/api/tags Should return list of available models.
Check tapes API is running:
curl http://localhost:8081/ping Should return pong.
Next Steps
You're now capturing Ollama conversations with full context and semantic search. Explore more capabilities:
- Configure semantic search - Deep dive into vector storage and embeddings
- Manage session history - Advanced checkout and branching workflows
- Set up MCP integration - Connect agents to search their conversation history
- View all configuration options - Advanced tapes configuration