Kafka Streaming
Stream conversation events from tapes to Kafka for downstream processing.
When to Use Kafka
tapes stores conversations locally. Kafka publishing adds a parallel event stream for external systems. Publishing is best-effort — storage success is never blocked by publish failures.
Use Kafka when you need:
- Real-time event streaming — Feed conversation data to analytics pipelines
- Event-driven architectures — Trigger workflows from conversation events
- Data warehousing — Route events to Clickhouse, Snowflake, or other sinks
If Kafka is unreachable, tapes continues storing conversations normally. No data is lost from storage — only the event stream is interrupted.
Prerequisites
Start a Kafka broker:
docker run --rm --name kafka -p 9092:9092 \
-e KAFKA_ADVERTISED_LISTENERS='PLAINTEXT://localhost:29092,PLAINTEXT_HOST://localhost:9092' \
confluentinc/confluent-local:latest Create a topic:
docker exec kafka kafka-topics \
--bootstrap-server localhost:29092 \
--create --if-not-exists \
--topic tapes.nodes.v1 \
--partitions 3 \
--replication-factor 1 Quick Start
Pass Kafka flags to the proxy server:
tapes serve proxy \
--sqlite ./tapes.sqlite \
--kafka-brokers localhost:9092 \
--kafka-topic tapes.nodes.v1 Events are published after each conversation turn is stored. All events in a conversation share the same partition key (root_hash).
CLI Flags
| Flag | Description |
|---|---|
--kafka-brokers | Comma-separated broker addresses (e.g., localhost:9092). Required for Kafka publishing. |
--kafka-topic | Topic name for conversation events. Required for Kafka publishing. |
--kafka-client-id | Optional client identifier sent to the broker. |
Kafka flags are only available on tapes serve proxy, not the combined tapes serve command.
Event Schema
Each published event follows the tapes.node.v1 schema:
{
"schema": "tapes.node.v1",
"root_hash": "339de8b9a547...efd73f",
"occurred_at": "2026-03-03T16:54:29.217197-07:00",
"node": {
"hash": "339de8b9a547...efd73f",
"parent_hash": null,
"bucket": {
"type": "message",
"role": "user",
"content": [{"type": "text", "text": "Hello"}],
"model": "gemma3:latest",
"provider": "ollama"
},
"project": "my-app"
}
} schema— Event version identifier (tapes.node.v1)root_hash— Conversation root hash, used as the Kafka partition keyoccurred_at— Timestamp when the event was publishednode— The Merkle DAG node payload (same structure as stored nodes)
Verify Events
Consume events from your topic to confirm publishing works:
docker exec kafka kafka-console-consumer \
--bootstrap-server localhost:29092 \
--topic tapes.nodes.v1 \
--from-beginning If you have kaf installed, you can also run kaf consume --brokers localhost:9092 tapes.nodes.v1.
Config File
Save Kafka settings in .tapes/config.toml:
[publisher.kafka]
brokers = "localhost:9092"
topic = "tapes.nodes.v1"
client_id = "my-app" CLI flags override config file values.
What's Next
- Streaming reference — Full event schema and configuration details
- PostgreSQL Storage — Production persistence for conversation data
- Search conversations — Find conversations by meaning
For downstream processing with Apache Flink — routing events to Clickhouse, Snowflake, or other sinks — a dedicated guide is coming soon.