Ollama Setup

Use tapes with Ollama to capture local LLM conversations, enable semantic search, and build transparent telemetry for your agent sessions.

This guide shows how to use tapes with Ollama for local model interactions. It assumes you've already installed tapes and have Ollama installed.

Video Walkthrough

See tapes and Ollama in action with this demonstration:

What You Get

Running tapes with Ollama captures every conversation turn in a searchable format:

Content-Addressable Storage

Each message and response gets a unique hash, creating checkpoints you can return to:

Resume context - Check out any point in a conversation and continue from there
Branch conversations - Try different approaches from the same starting point
Audit interactions - See exactly what was said at each step

Semantic Search

Search conversations by meaning, not keywords:

Find past work - Locate relevant conversations even if you don't remember exact wording
Vector embeddings - Uses embeddinggemma to understand semantic similarity
MCP integration - Agents can search their own history

Transparent Telemetry

Complete visibility into agent sessions:

Session history - Full record of all requests and responses
Agent memory - Agents can reference previous interactions
Debugging context - Understand what the model saw and why it responded

Start Ollama

Open a terminal and start the Ollama inference server:

ollama serve

This starts Ollama on http://localhost:11434 by default.

Pull Required Models

Download the models you want to use. For example, to use Gemma 2:

ollama pull gemma2

For semantic search, pull the embedding model:

ollama pull embeddinggemma

Keep the Ollama terminal running. Open a new terminal for the next steps.

tapes stores sessions in PostgreSQL with the pg_duckdb and pgvector extensions. The fastest way to bring one up locally is tapes local up, which also runs an Ollama container with embeddinggemma preinstalled. If you'd rather use an Ollama you already have running, skip tapes local up and supply your own Postgres DSN.

# Start Postgres + Ollama in Docker
tapes local up

# Start tapes against the local Postgres
tapes serve --postgres "postgres://tapes:tapes@localhost:5432/tapes?sslmode=disable"

This starts:

Proxy server on http://localhost:8080 — captures Ollama traffic
API server on http://localhost:8081 — query stored conversations
Ingest server on http://localhost:8082 — sidecar HTTP ingest
Postgres storage with the Merkle DAG and pgvector embeddings in the same database

By default, tapes targets Ollama at http://localhost:11434. Keep this terminal running.

Interactive Chat

tapes includes an experimental chat client for quick testing. In a new terminal:

tapes chat

Select a model when prompted (e.g., gemma2) and start chatting. Every message is captured and stored.

Example Session

$ tapes chat
Starting new conversation
> hello, how are you?
I'm doing well, thanks for asking!

> where is New York?
New York is a state in the northeastern United States...

Each conversation turn receives a unique hash for later reference.

Search Your History

Once you've had a few conversations, search through them semantically:

tapes search "New York"

Results show:

Score - Similarity score (higher = more relevant)
Hash - Unique identifier for that conversation turn
Preview - Snippet of the matched content

Search uses semantic embeddings, not keyword matching. "where is NYC" will match "New York" discussions.

Resume from Checkpoints

Copy a hash from search results and resume the conversation from that exact point:

tapes checkout abc123def456...

Now when you run tapes chat, it resumes from that checkpoint with full context:

$ tapes chat
Resuming from checkpoint abc123def456...
3 messages loaded

> what was the last message I sent?
You asked "where is New York?"

>

View Current Checkout

Check which conversation point you're at:

tapes status

Clear Checkout

Return to a fresh conversation state:

tapes checkout

MCP Integration

The tapes API server includes an MCP endpoint for agent integration. This lets agents search their own conversation history.

Inspect MCP Server

Use the MCP Inspector to test the search tool:

npx @modelcontextprotocol/inspector http://localhost:8081/v1/mcp

The search tool accepts natural language queries and returns relevant conversation segments with full context.

See MCP documentation for integration details with Claude Code and other agents.

Enable Vector Search

Semantic search is on whenever an embedding provider is configured. Embeddings are written to the same Postgres database via pgvector — there is no separate vector service to run.

Start tapes with Embeddings

tapes serve \
  --postgres "postgres://tapes:tapes@localhost:5432/tapes?sslmode=disable" \
  --embedding-provider ollama \
  --embedding-target "http://localhost:11434" \
  --embedding-model embeddinggemma

Now all conversations are automatically embedded for semantic search.

--vector-store-target defaults to the --postgres DSN. Set it explicitly only if you want embeddings in a different Postgres database.

Query the API

Access stored conversations programmatically:

View Statistics

curl http://localhost:8081/v1/stats

Returns counts, token totals, cost, duration, and tool-call totals over the matched window. See the /v1/stats reference.

List Sessions

curl http://localhost:8081/v1/sessions

Returns one harness/agent session per row, newest-first, with cursor pagination.

Inspect a Stem

curl http://localhost:8081/v1/stems/abc123def456...

Returns the full ancestry chain from root to the given leaf hash.

Search via API

curl "http://localhost:8081/v1/search?q=New%20York"

Returns semantically similar conversations.

See API reference for the full endpoint list and the Scalar UI at /swagger.

Troubleshooting

Ollama Connection Failed

Error: tapes can't connect to Ollama

Solutions:

Verify Ollama is running: curl http://localhost:11434
Check Ollama is on default port (11434)
If using custom port, add --upstream http://localhost:PORT

Model Not Available

Error: Selected model doesn't exist

Solution:

List available models: ollama list
Pull the model: ollama pull model-name

Search Returns No Results

Issue: tapes search shows no matches

Solutions:

Verify embedding model is pulled: ollama list | grep embeddinggemma
Check conversations exist: curl http://localhost:8081/v1/stats
Ensure you've had at least one conversation through the proxy

No Conversations Stored

Issue: API shows 0 nodes despite using chat

Solutions:

Confirm Postgres is up: tapes local status (or docker ps | grep tapes-local)
Verify you started tapes with --postgres "$DSN"
Check terminal logs for storage errors

Verify Setup

Check Ollama is responding:

curl http://localhost:11434/api/tags

Should return list of available models.

Check tapes API is running:

curl http://localhost:8081/ping

Should return pong.

Next Steps

You're now capturing Ollama conversations with full context and semantic search. Explore more capabilities:

Configure semantic search - Deep dive into vector storage and embeddings
Manage session history - Advanced checkout and branching workflows
Set up MCP integration - Connect agents to search their conversation history
View all configuration options - Advanced tapes configuration