The challenge
Distributed tracing had been a promise for years that broke in production under sheer volume. Teams collected billions of spans but could not search them in a reasonable time. Commercial solutions were expensive, slow — or both.
The approach
We tackled the problem at the root: an ingest layer written in Rust on Tokio, columnar storage in ClickHouse and an OpenTelemetry-native wire format. Live updates over WebSockets instead of polling. No magic, just consistent engineering decisions.
The toolkit ships as open source — not as a marketing gesture, but because observability tooling must be auditable.
The outcome
Used by platform teams in 40+ companies. P99 query latency under 800 ms on 100 TB of trace data. A community that drives the project forward on its own.
Stack
- Rust
- Tokio
- ClickHouse
- OpenTelemetry
- WebSocket