Copied to clipboard

Ollama Setup

Use tapes with Ollama to capture local LLM conversations, enable semantic search, and build transparent telemetry for your agent sessions.

This guide shows how to use tapes with Ollama for local model interactions. It assumes you've already installed tapes and have Ollama installed.

Video Walkthrough

See tapes and Ollama in action with this demonstration:

What You Get

Running tapes with Ollama captures every conversation turn in a searchable format:

Content-Addressable Storage

Each message and response gets a unique hash, creating checkpoints you can return to:

  • Resume context - Check out any point in a conversation and continue from there
  • Branch conversations - Try different approaches from the same starting point
  • Audit interactions - See exactly what was said at each step

Search conversations by meaning, not keywords:

  • Find past work - Locate relevant conversations even if you don't remember exact wording
  • Vector embeddings - Uses nomic-embed-text to understand semantic similarity
  • MCP integration - Agents can search their own history

Transparent Telemetry

Complete visibility into agent sessions:

  • Session history - Full record of all requests and responses
  • Agent memory - Agents can reference previous interactions
  • Debugging context - Understand what the model saw and why it responded

Start Ollama

Open a terminal and start the Ollama inference server:

ollama serve

This starts Ollama on http://localhost:11434 by default.

Pull Required Models

Download the models you want to use. For example, to use Gemma 2:

ollama pull gemma2

For semantic search, pull the embedding model:

ollama pull nomic-embed-text

Keep the Ollama terminal running. Open a new terminal for the next steps.

Start tapes

In a new terminal, start tapes configured for Ollama:

tapes serve --sqlite "./tapes.sqlite"

This starts:

  • Proxy server on http://localhost:8080 — captures Ollama traffic
  • API server on http://localhost:8081 — query stored conversations
  • SQLite storage in ./tapes.sqlite with SQL vec for vector search

By default, tapes targets Ollama at http://localhost:11434. Keep this terminal running.

Interactive Chat

tapes includes an experimental chat client for quick testing. In a new terminal:

tapes chat

Select a model when prompted (e.g., gemma2) and start chatting. Every message is captured and stored.

Example Session

$ tapes chat
Starting new conversation
> hello, how are you?
I'm doing well, thanks for asking!

> where is New York?
New York is a state in the northeastern United States...

Each conversation turn receives a unique hash for later reference.

Search Your History

Once you've had a few conversations, search through them semantically:

tapes search "New York"

Results show:

  • Score - Similarity score (higher = more relevant)
  • Hash - Unique identifier for that conversation turn
  • Preview - Snippet of the matched content

Search uses semantic embeddings, not keyword matching. "where is NYC" will match "New York" discussions.

Resume from Checkpoints

Copy a hash from search results and resume the conversation from that exact point:

tapes checkout abc123def456...

Now when you run tapes chat, it resumes from that checkpoint with full context:

$ tapes chat
Resuming from checkpoint abc123def456...
3 messages loaded

> what was the last message I sent?
You asked "where is New York?"

>

View Current Checkout

Check which conversation point you're at:

tapes status

Clear Checkout

Return to a fresh conversation state:

tapes checkout

MCP Integration

The tapes API server includes an MCP endpoint for agent integration. This lets agents search their own conversation history.

Inspect MCP Server

Use the MCP Inspector to test the search tool:

npx @modelcontextprotocol/inspector http://localhost:8081/v1/mcp

The search tool accepts natural language queries and returns relevant conversation segments with full context.

See MCP documentation for integration details with Claude Code and other agents.

Query the API

Access stored conversations programmatically:

View Statistics

curl http://localhost:8081/v1/stats

Returns session count, turn count, and root count.

List Sessions

curl http://localhost:8081/v1/sessions

Returns paginated sessions with metadata.

Inspect a Session

curl http://localhost:8081/v1/sessions/abc123def456...

Returns the full session chain with all turns.

curl "http://localhost:8081/v1/search?q=New%20York"

Returns semantically similar conversations.

See API reference for the full endpoint list and the Scalar UI at /swagger.

Troubleshooting

Ollama Connection Failed

Error: tapes can't connect to Ollama

Solutions:

  • Verify Ollama is running: curl http://localhost:11434
  • Check Ollama is on default port (11434)
  • If using custom port, add --upstream http://localhost:PORT

Model Not Available

Error: Selected model doesn't exist

Solution:

  • List available models: ollama list
  • Pull the model: ollama pull model-name

Search Returns No Results

Issue: tapes search shows no matches

Solutions:

  • Verify embedding model is pulled: ollama list | grep nomic-embed-text
  • Check conversations exist: curl http://localhost:8081/v1/stats
  • Ensure you've had at least one conversation through the proxy

No Conversations Stored

Issue: API shows 0 nodes despite using chat

Solutions:

  • Check SQLite file exists: ls -la tapes.sqlite
  • Verify you started tapes with --sqlite "./tapes.sqlite" flag
  • Check terminal logs for storage errors

Verify Setup

Check Ollama is responding:

curl http://localhost:11434/api/tags

Should return list of available models.

Check tapes API is running:

curl http://localhost:8081/ping

Should return pong.

Next Steps

You're now capturing Ollama conversations with full context and semantic search. Explore more capabilities:

Last updated: