tapes serve
Run the proxy, API, and ingest servers to capture LLM conversations. Use this when you need explicit control over ports and configuration.
tapes requires PostgreSQL with the pg_duckdb and pgvector extensions. Schema migrations run automatically when the process starts. Use tapes local up to bootstrap a local Postgres + Ollama in Docker.
Usage
# Run proxy, API, and ingest servers together
tapes serve --postgres "postgres://tapes:tapes@localhost:5432/tapes?sslmode=disable"
# Run just the proxy
tapes serve proxy --postgres "$DSN"
# Run just the API server
tapes serve api --postgres "$DSN"
# Run just the ingest server (sidecar mode)
tapes serve ingest --postgres "$DSN" Flags for tapes serve
| Flag | Description |
|---|---|
-p, --proxy-listen | Proxy server address (default: :8080) |
-a, --api-listen | API server address (default: :8081) |
-i, --ingest-listen | Ingest server address for sidecar mode (default: :8082) |
-u, --upstream | LLM provider URL (default: http://localhost:11434) |
--provider | Provider type: ollama, openai, anthropic (default: ollama) |
--postgres | PostgreSQL connection string (DSN) — required |
--api-web-ui | Enable the browser DAG visualization at / on the API server (off by default) |
--vector-store-target | pgvector connection string (defaults to --postgres when unset) |
--embedding-provider | Embedding provider type (default: ollama) |
--embedding-target | Embedding provider URL (default: http://localhost:11434) |
--embedding-model | Embedding model name (default: embeddinggemma) |
--embedding-dimensions | Embedding vector dimensions (default: 768) |
-d, --debug | Enable debug logging |
Flags for tapes serve proxy
| Flag | Description |
|---|---|
-l, --listen | Server address (default: :8080) |
-u, --upstream | LLM provider URL (default: http://localhost:11434) |
-p, --provider | Provider type: ollama, openai, anthropic (default: ollama) |
--postgres | PostgreSQL connection string (DSN) — required |
--vector-store-target | pgvector connection string (defaults to --postgres when unset) |
--embedding-provider | Embedding provider type (optional) |
--embedding-target | Embedding provider URL (optional) |
--embedding-model | Embedding model name (optional) |
--kafka-brokers | Comma-separated Kafka broker addresses (e.g., localhost:9092) |
--kafka-topic | Kafka topic for publishing session events |
--kafka-client-id | Optional Kafka client ID |
Embedding flags are optional for tapes serve proxy. If omitted, semantic search is disabled. Kafka flags enable event streaming — see Kafka Streaming guide. All flags can also be set via TAPES_* environment variables — see environment variables.
Flags for tapes serve api
The API server exposes read endpoints over the Merkle DAG and the /metrics endpoint for Prometheus scraping. See the Inspect reference for the full endpoint list.
| Flag | Description |
|---|---|
-l, --listen | Server address (default: :8081) |
--postgres | PostgreSQL connection string (DSN) — required |
--web-ui | Enable the browser DAG visualization at / (off by default) |
--vector-store-target | pgvector connection string (defaults to --postgres when unset) |
--embedding-provider | Embedding provider type (required for /v1/search and /v1/mcp) |
--embedding-target | Embedding provider URL |
--embedding-model | Embedding model name (default: embeddinggemma) |
Flags for tapes serve ingest
The ingest server accepts completed LLM conversation turns via HTTP and stores them in the Merkle DAG. Use this when an external gateway (e.g., Envoy AI Gateway) handles upstream LLM traffic and tapes only needs to store, embed, and publish data.
| Flag | Description |
|---|---|
-l, --listen | Server address (default: :8082) |
--postgres | PostgreSQL connection string (DSN) — required |
--project | Project name to tag sessions (default: auto-detect from git) |
--vector-store-target | pgvector connection string (defaults to --postgres when unset) |
--embedding-provider | Embedding provider type |
--embedding-target | Embedding provider URL |
--embedding-model | Embedding model name |
--embedding-dimensions | Embedding vector dimensions |
--kafka-brokers | Comma-separated Kafka broker addresses |
--kafka-topic | Kafka topic for publishing session events |
--kafka-client-id | Optional Kafka client ID |
Ingest Endpoints
The ingest server exposes the following HTTP endpoints:
| Endpoint | Description |
|---|---|
GET /ping | Health check endpoint |
GET /metrics | Prometheus RED metrics (unauthenticated) |
POST /v1/ingest | Accept a single conversation turn |
POST /v1/ingest/batch | Accept multiple conversation turns |
Ingest Payload Format
Send conversation turns with a provider-specific request and a reduced, provider-agnostic response. The optional session envelope tags the turn with harness metadata so sessions group correctly across replays.
POST /v1/ingest
Content-Type: application/json
{
"provider": "openai",
"agent_name": "my-agent",
"session": {
"org_id": "acme",
"auth_subject": "[email protected]",
"harness_id": "claude-code",
"harness_session_id": "8f2c…",
"harness_version": "1.4.2",
"cwd": "/home/me/project",
"name": "refactor auth middleware",
"parent_harness_session_id": null,
"harness_metadata": { "branch": "feature/auth" }
},
"request": {
"model": "gpt-4",
"messages": [
{ "role": "user", "content": "Hello" }
]
},
"response": {
"model": "gpt-4",
"message": {
"role": "assistant",
"content": [
{ "type": "text", "text": "Hi there!" }
]
},
"done": true,
"stop_reason": "stop",
"usage": {
"prompt_tokens": 10,
"completion_tokens": 5,
"total_tokens": 15
}
}
} Supported providers: openai, anthropic, ollama. The request uses the provider's native API format; the response uses tapes' reduced, provider-agnostic format. Ingest is idempotent — replaying the same payload re-hashes to the same DAG node.
tapes records total_duration_ns on every response across all providers (Anthropic, OpenAI, Ollama), measured at the proxy or ingest boundary. This duration powers the per-provider latency aggregates exposed by /v1/stats.
Response Format
The response object must use tapes' reduced format, not the raw provider response:
| Field | Type | Required | Description |
|---|---|---|---|
model | string | yes | Model that generated the response |
message | object | yes | The assistant's response with role and content array |
done | boolean | yes | Whether generation is complete (set true for non-streamed turns) |
stop_reason | string | no | Provider's stop reason, passed through unchanged. Examples: stop, length, tool_use, end_turn |
usage | object | no | Token usage with prompt_tokens, completion_tokens, total_tokens |
extra | object | no | Provider-specific fields that don't map to the common schema |
The message.content array contains content blocks, each with a type field. For text responses, use { "type": "text", "text": "..." }.
The reduced format normalizes the wire shape, not the values inside it — stop_reason in particular is whatever the upstream provider returned. See /swagger for the canonical schema, including optional fields like created_at and raw_response.
Metrics
The API and ingest servers expose Prometheus RED (Rate / Errors / Duration) metrics at GET /metrics. Both endpoints are intentionally unauthenticated so an in-cluster Prometheus can scrape them.
API server
| Metric | Description |
|---|---|
tapes_apiserver_requests_total{route,method,status} | Counter of API requests. route is a templated path (e.g., /v1/stems/:hash) so cardinality stays bounded. |
tapes_apiserver_request_duration_seconds{route,method} | Histogram of request latency per route. |
tapes_apiserver_inflight_requests | Gauge of currently in-flight requests. |
Ingest server
Metric Description tapes_ingest_writes_total{provider,status} Counter of ingested turns, partitioned by provider and outcome (ok / error). tapes_ingest_dag_write_seconds{provider} Histogram of DAG write latency per provider. tapes_ingest_worker_queue_depth Gauge of pending work in the background ingest worker. tapes_ingest_body_bytes{provider} Histogram of request body size per provider.
Example scrape: curl http://localhost:8081/metrics for the API server, curl http://localhost:8082/metrics for the ingest server.