Copied to clipboard

tapes serve

Run the proxy, API, and ingest servers to capture LLM conversations. Use this when you need explicit control over ports and configuration.

tapes requires PostgreSQL with the pg_duckdb and pgvector extensions. Schema migrations run automatically when the process starts. Use tapes local up to bootstrap a local Postgres + Ollama in Docker.

Usage

# Run proxy, API, and ingest servers together
tapes serve --postgres "postgres://tapes:tapes@localhost:5432/tapes?sslmode=disable"

# Run just the proxy
tapes serve proxy --postgres "$DSN"

# Run just the API server
tapes serve api --postgres "$DSN"

# Run just the ingest server (sidecar mode)
tapes serve ingest --postgres "$DSN"

Flags for tapes serve

Flag Description
-p, --proxy-listen Proxy server address (default: :8080)
-a, --api-listen API server address (default: :8081)
-i, --ingest-listen Ingest server address for sidecar mode (default: :8082)
-u, --upstream LLM provider URL (default: http://localhost:11434)
--provider Provider type: ollama, openai, anthropic (default: ollama)
--postgres PostgreSQL connection string (DSN) — required
--api-web-ui Enable the browser DAG visualization at / on the API server (off by default)
--vector-store-target pgvector connection string (defaults to --postgres when unset)
--embedding-provider Embedding provider type (default: ollama)
--embedding-target Embedding provider URL (default: http://localhost:11434)
--embedding-model Embedding model name (default: embeddinggemma)
--embedding-dimensions Embedding vector dimensions (default: 768)
-d, --debug Enable debug logging

Flags for tapes serve proxy

Flag Description
-l, --listen Server address (default: :8080)
-u, --upstream LLM provider URL (default: http://localhost:11434)
-p, --provider Provider type: ollama, openai, anthropic (default: ollama)
--postgres PostgreSQL connection string (DSN) — required
--vector-store-target pgvector connection string (defaults to --postgres when unset)
--embedding-provider Embedding provider type (optional)
--embedding-target Embedding provider URL (optional)
--embedding-model Embedding model name (optional)
--kafka-brokers Comma-separated Kafka broker addresses (e.g., localhost:9092)
--kafka-topic Kafka topic for publishing session events
--kafka-client-id Optional Kafka client ID

Embedding flags are optional for tapes serve proxy. If omitted, semantic search is disabled. Kafka flags enable event streaming — see Kafka Streaming guide. All flags can also be set via TAPES_* environment variables — see environment variables.

Flags for tapes serve api

The API server exposes read endpoints over the Merkle DAG and the /metrics endpoint for Prometheus scraping. See the Inspect reference for the full endpoint list.

Flag Description
-l, --listen Server address (default: :8081)
--postgres PostgreSQL connection string (DSN) — required
--web-ui Enable the browser DAG visualization at / (off by default)
--vector-store-target pgvector connection string (defaults to --postgres when unset)
--embedding-provider Embedding provider type (required for /v1/search and /v1/mcp)
--embedding-target Embedding provider URL
--embedding-model Embedding model name (default: embeddinggemma)

Flags for tapes serve ingest

The ingest server accepts completed LLM conversation turns via HTTP and stores them in the Merkle DAG. Use this when an external gateway (e.g., Envoy AI Gateway) handles upstream LLM traffic and tapes only needs to store, embed, and publish data.

Flag Description
-l, --listen Server address (default: :8082)
--postgres PostgreSQL connection string (DSN) — required
--project Project name to tag sessions (default: auto-detect from git)
--vector-store-target pgvector connection string (defaults to --postgres when unset)
--embedding-provider Embedding provider type
--embedding-target Embedding provider URL
--embedding-model Embedding model name
--embedding-dimensions Embedding vector dimensions
--kafka-brokers Comma-separated Kafka broker addresses
--kafka-topic Kafka topic for publishing session events
--kafka-client-id Optional Kafka client ID

Ingest Endpoints

The ingest server exposes the following HTTP endpoints:

Endpoint Description
GET /ping Health check endpoint
GET /metrics Prometheus RED metrics (unauthenticated)
POST /v1/ingest Accept a single conversation turn
POST /v1/ingest/batch Accept multiple conversation turns

Ingest Payload Format

Send conversation turns with a provider-specific request and a reduced, provider-agnostic response. The optional session envelope tags the turn with harness metadata so sessions group correctly across replays.

POST /v1/ingest
Content-Type: application/json

{
  "provider": "openai",
  "agent_name": "my-agent",
  "session": {
    "org_id": "acme",
    "auth_subject": "[email protected]",
    "harness_id": "claude-code",
    "harness_session_id": "8f2c…",
    "harness_version": "1.4.2",
    "cwd": "/home/me/project",
    "name": "refactor auth middleware",
    "parent_harness_session_id": null,
    "harness_metadata": { "branch": "feature/auth" }
  },
  "request": {
    "model": "gpt-4",
    "messages": [
      { "role": "user", "content": "Hello" }
    ]
  },
  "response": {
    "model": "gpt-4",
    "message": {
      "role": "assistant",
      "content": [
        { "type": "text", "text": "Hi there!" }
      ]
    },
    "done": true,
    "stop_reason": "stop",
    "usage": {
      "prompt_tokens": 10,
      "completion_tokens": 5,
      "total_tokens": 15
    }
  }
}

Supported providers: openai, anthropic, ollama. The request uses the provider's native API format; the response uses tapes' reduced, provider-agnostic format. Ingest is idempotent — replaying the same payload re-hashes to the same DAG node.

tapes records total_duration_ns on every response across all providers (Anthropic, OpenAI, Ollama), measured at the proxy or ingest boundary. This duration powers the per-provider latency aggregates exposed by /v1/stats.

Response Format

The response object must use tapes' reduced format, not the raw provider response:

Field Type Required Description
model string yes Model that generated the response
message object yes The assistant's response with role and content array
done boolean yes Whether generation is complete (set true for non-streamed turns)
stop_reason string no Provider's stop reason, passed through unchanged. Examples: stop, length, tool_use, end_turn
usage object no Token usage with prompt_tokens, completion_tokens, total_tokens
extra object no Provider-specific fields that don't map to the common schema

The message.content array contains content blocks, each with a type field. For text responses, use { "type": "text", "text": "..." }.

The reduced format normalizes the wire shape, not the values inside it — stop_reason in particular is whatever the upstream provider returned. See /swagger for the canonical schema, including optional fields like created_at and raw_response.

Metrics

The API and ingest servers expose Prometheus RED (Rate / Errors / Duration) metrics at GET /metrics. Both endpoints are intentionally unauthenticated so an in-cluster Prometheus can scrape them.

API server

Metric Description
tapes_apiserver_requests_total{route,method,status} Counter of API requests. route is a templated path (e.g., /v1/stems/:hash) so cardinality stays bounded.
tapes_apiserver_request_duration_seconds{route,method} Histogram of request latency per route.
tapes_apiserver_inflight_requests Gauge of currently in-flight requests.

Ingest server

Metric Description
tapes_ingest_writes_total{provider,status} Counter of ingested turns, partitioned by provider and outcome (ok / error).
tapes_ingest_dag_write_seconds{provider} Histogram of DAG write latency per provider.
tapes_ingest_worker_queue_depth Gauge of pending work in the background ingest worker.
tapes_ingest_body_bytes{provider} Histogram of request body size per provider.

Example scrape: curl http://localhost:8081/metrics for the API server, curl http://localhost:8082/metrics for the ingest server.

Last updated: