Copied to clipboard

tapes serve

Run the proxy and API servers to capture LLM conversations. Use this when you need explicit control over ports and configuration.

Usage

# Run proxy, API, and ingest servers together
tapes serve

# Run just the proxy
tapes serve proxy

# Run just the API server
tapes serve api

# Run just the ingest server (sidecar mode)
tapes serve ingest

Flags for tapes serve

Flag Description
-p, --proxy-listen Proxy server address (default: :8080)
-a, --api-listen API server address (default: :8081)
-i, --ingest-listen Ingest server address for sidecar mode (default: :8082)
-u, --upstream LLM provider URL (default: http://localhost:11434)
--provider Provider type: ollama, openai, anthropic (default: ollama)
-s, --sqlite SQLite database path (default: ~/.tapes/tapes.sqlite)
--postgres PostgreSQL connection string (DSN) for persistent storage
--vector-store-provider Vector store type: sqlite, chroma, qdrant, pgvector (default: sqlite)
--vector-store-target Vector store filepath or URL (default: ~/.tapes/tapes.sqlite)
--embedding-provider Embedding provider type (default: ollama)
--embedding-target Embedding provider URL (default: http://localhost:11434)
--embedding-model Embedding model name (default: embeddinggemma:latest)
--embedding-dimensions Embedding vector dimensions (default: 768)
-d, --debug Enable debug logging

Flags for tapes serve proxy

Flag Description
-l, --listen Server address (default: :8080)
-u, --upstream LLM provider URL (default: http://localhost:11434)
-p, --provider Provider type: ollama, openai, anthropic (default: ollama)
-s, --sqlite SQLite database path (default: in-memory)
--postgres PostgreSQL connection string (DSN) for persistent storage
--vector-store-provider Vector store type: sqlite, chroma, qdrant, pgvector (default: sqlite)
--vector-store-target Vector store filepath or URL (optional)
--embedding-provider Embedding provider type (optional)
--embedding-target Embedding provider URL (optional)
--embedding-model Embedding model name (optional)
--kafka-brokers Comma-separated Kafka broker addresses (e.g., localhost:9092)
--kafka-topic Kafka topic for publishing session events
--kafka-client-id Optional Kafka client ID

Vector/embedding flags are optional for tapes serve proxy. If omitted, semantic search is disabled. Kafka flags enable event streaming — see Kafka Streaming guide. All flags can also be set via TAPES_* environment variables — see environment variables.

Flags for tapes serve ingest

The ingest server accepts completed LLM conversation turns via HTTP and stores them in the Merkle DAG. Use this when an external gateway (e.g., Envoy AI Gateway) handles upstream LLM traffic and tapes only needs to store, embed, and publish data.

Flag Description
-l, --listen Server address (default: :8082)
-s, --sqlite SQLite database path (default: in-memory)
--postgres PostgreSQL connection string (DSN) for persistent storage
--project Project name to tag sessions (default: auto-detect from git)
--vector-store-provider Vector store type: sqlite, chroma, qdrant
--vector-store-target Vector store filepath or URL
--embedding-provider Embedding provider type
--embedding-target Embedding provider URL
--embedding-model Embedding model name
--embedding-dimensions Embedding vector dimensions
--kafka-brokers Comma-separated Kafka broker addresses
--kafka-topic Kafka topic for publishing session events
--kafka-client-id Optional Kafka client ID

Ingest Endpoints

The ingest server exposes the following HTTP endpoints:

Endpoint Description
GET /ping Health check endpoint
POST /v1/ingest Accept a single conversation turn
POST /v1/ingest/batch Accept multiple conversation turns

Ingest Payload Format

Send conversation turns with a provider-specific request and a reduced, provider-agnostic response:

POST /v1/ingest
Content-Type: application/json

{
  "provider": "openai",
  "agent_name": "my-agent",
  "request": {
    "model": "gpt-4",
    "messages": [
      { "role": "user", "content": "Hello" }
    ]
  },
  "response": {
    "model": "gpt-4",
    "message": {
      "role": "assistant",
      "content": [
        { "type": "text", "text": "Hi there!" }
      ]
    },
    "done": true,
    "stop_reason": "stop",
    "usage": {
      "prompt_tokens": 10,
      "completion_tokens": 5,
      "total_tokens": 15
    }
  }
}

Supported providers: openai, anthropic, ollama. The request field uses the provider's native API format, while the response field uses a reduced, provider-agnostic format.

Response Format

The response object must use tapes' reduced format, not the raw provider response:

Field Type Required Description
model string yes Model that generated the response
message object yes The assistant's response with role and content array
done boolean yes Whether generation is complete (set true for non-streamed turns)
stop_reason string no Provider's stop reason, passed through unchanged. Examples: stop, length, tool_use, end_turn
usage object no Token usage with prompt_tokens, completion_tokens, total_tokens
extra object no Provider-specific fields that don't map to the common schema

The message.content array contains content blocks, each with a type field. For text responses, use { "type": "text", "text": "..." }.

The reduced format normalizes the wire shape, not the values inside it — stop_reason in particular is whatever the upstream provider returned. See /swagger for the canonical schema, including optional fields like created_at and raw_response.

Last updated: