tapes serve
Run the proxy and API servers to capture LLM conversations. Use this when you need explicit control over ports and configuration.
Usage
# Run proxy, API, and ingest servers together
tapes serve
# Run just the proxy
tapes serve proxy
# Run just the API server
tapes serve api
# Run just the ingest server (sidecar mode)
tapes serve ingest Flags for tapes serve
| Flag | Description |
|---|---|
-p, --proxy-listen | Proxy server address (default: :8080) |
-a, --api-listen | API server address (default: :8081) |
-i, --ingest-listen | Ingest server address for sidecar mode (default: :8082) |
-u, --upstream | LLM provider URL (default: http://localhost:11434) |
--provider | Provider type: ollama, openai, anthropic (default: ollama) |
-s, --sqlite | SQLite database path (default: ~/.tapes/tapes.sqlite) |
--postgres | PostgreSQL connection string (DSN) for persistent storage |
--vector-store-provider | Vector store type: sqlite, chroma, qdrant, pgvector (default: sqlite) |
--vector-store-target | Vector store filepath or URL (default: ~/.tapes/tapes.sqlite) |
--embedding-provider | Embedding provider type (default: ollama) |
--embedding-target | Embedding provider URL (default: http://localhost:11434) |
--embedding-model | Embedding model name (default: embeddinggemma:latest) |
--embedding-dimensions | Embedding vector dimensions (default: 768) |
-d, --debug | Enable debug logging |
Flags for tapes serve proxy
| Flag | Description |
|---|---|
-l, --listen | Server address (default: :8080) |
-u, --upstream | LLM provider URL (default: http://localhost:11434) |
-p, --provider | Provider type: ollama, openai, anthropic (default: ollama) |
-s, --sqlite | SQLite database path (default: in-memory) |
--postgres | PostgreSQL connection string (DSN) for persistent storage |
--vector-store-provider | Vector store type: sqlite, chroma, qdrant, pgvector (default: sqlite) |
--vector-store-target | Vector store filepath or URL (optional) |
--embedding-provider | Embedding provider type (optional) |
--embedding-target | Embedding provider URL (optional) |
--embedding-model | Embedding model name (optional) |
--kafka-brokers | Comma-separated Kafka broker addresses (e.g., localhost:9092) |
--kafka-topic | Kafka topic for publishing session events |
--kafka-client-id | Optional Kafka client ID |
Vector/embedding flags are optional for tapes serve proxy. If omitted, semantic search is disabled. Kafka flags enable event streaming — see Kafka Streaming guide. All flags can also be set via TAPES_* environment variables — see environment variables.
Flags for tapes serve ingest
The ingest server accepts completed LLM conversation turns via HTTP and stores them in the Merkle DAG. Use this when an external gateway (e.g., Envoy AI Gateway) handles upstream LLM traffic and tapes only needs to store, embed, and publish data.
| Flag | Description |
|---|---|
-l, --listen | Server address (default: :8082) |
-s, --sqlite | SQLite database path (default: in-memory) |
--postgres | PostgreSQL connection string (DSN) for persistent storage |
--project | Project name to tag sessions (default: auto-detect from git) |
--vector-store-provider | Vector store type: sqlite, chroma, qdrant |
--vector-store-target | Vector store filepath or URL |
--embedding-provider | Embedding provider type |
--embedding-target | Embedding provider URL |
--embedding-model | Embedding model name |
--embedding-dimensions | Embedding vector dimensions |
--kafka-brokers | Comma-separated Kafka broker addresses |
--kafka-topic | Kafka topic for publishing session events |
--kafka-client-id | Optional Kafka client ID |
Ingest Endpoints
The ingest server exposes the following HTTP endpoints:
| Endpoint | Description |
|---|---|
GET /ping | Health check endpoint |
POST /v1/ingest | Accept a single conversation turn |
POST /v1/ingest/batch | Accept multiple conversation turns |
Ingest Payload Format
Send conversation turns with a provider-specific request and a reduced, provider-agnostic response:
POST /v1/ingest
Content-Type: application/json
{
"provider": "openai",
"agent_name": "my-agent",
"request": {
"model": "gpt-4",
"messages": [
{ "role": "user", "content": "Hello" }
]
},
"response": {
"model": "gpt-4",
"message": {
"role": "assistant",
"content": [
{ "type": "text", "text": "Hi there!" }
]
},
"done": true,
"stop_reason": "stop",
"usage": {
"prompt_tokens": 10,
"completion_tokens": 5,
"total_tokens": 15
}
}
} Supported providers: openai, anthropic, ollama. The request field uses the provider's native API format, while the response field uses a reduced, provider-agnostic format.
Response Format
The response object must use tapes' reduced format, not the raw provider response:
| Field | Type | Required | Description |
|---|---|---|---|
model | string | yes | Model that generated the response |
message | object | yes | The assistant's response with role and content array |
done | boolean | yes | Whether generation is complete (set true for non-streamed turns) |
stop_reason | string | no | Provider's stop reason, passed through unchanged. Examples: stop, length, tool_use, end_turn |
usage | object | no | Token usage with prompt_tokens, completion_tokens, total_tokens |
extra | object | no | Provider-specific fields that don't map to the common schema |
The message.content array contains content blocks, each with a type field. For text responses, use { "type": "text", "text": "..." }.
The reduced format normalizes the wire shape, not the values inside it — stop_reason in particular is whatever the upstream provider returned. See /swagger for the canonical schema, including optional fields like created_at and raw_response.