Skip to content

Week 19 - Observability: tracing, metrics, OpenTelemetry

19.1 Conceptual Core

  • Three pillars: logs, metrics, traces. In Rust, the de-facto crates are tracing (logs+spans), metrics (counters/gauges/histograms), and opentelemetry (export to OTLP collectors).
  • Spans are the structured-logging analog of stack frames: a span represents a unit of work, may contain child spans, and carries fields. tracing instruments your code with macros.
  • Subscriber is the consumer of spans/events: pretty-printer, JSON logger, Jaeger/OTLP exporter, or a custom sink. Subscribers compose via Layers.

19.2 Mechanical Detail

  • Replace log - crate users withtracing(thelog - compat shim is adequate). Place #[tracing::instrument] on every async use case in the application layer. Never on tight loops in hot paths-it has overhead.
  • tracing-subscriber setup: EnvFilter for runtime log-level control; fmt::Layer for stdout; opentelemetry::tracing layer for distributed tracing.
  • Metrics: metrics facade + metrics-exporter-prometheus for /metrics scraping. Counters for events, gauges for resource levels, histograms for latencies. Use Histogram not `Summary - Prometheus aggregation matters.
  • Cardinality discipline: a label like user_id has unbounded cardinality and will OOM Prometheus. Tag with tenant_id, endpoint, status_class only.

19.3 Lab-"Add Observability to the Hexagonal URL Shortener"

Take week 17's URL shortener and add: - tracing::instrument on every use case, with explicit fields (no PII). - Prometheus /metrics endpoint with request counts and per-endpoint latency histograms. - OTLP export to a local Jaeger via docker-compose. - A flamegraph.svg from a 30-second load test, committed.

19.4 Idiomatic & Clippy Drill

  • clippy::print_stdout, clippy::print_stderr, clippy::dbg_macro. Production code logs through tracing; these lints catch slip-ups.

19.5 Production Hardening Slice

  • Configure log/metric redaction at the subscriber layer. Demo: a request body containing an email address must not appear in logs. Add a regex-based redaction layer; unit-test it. This is a compliance prerequisite.

Comments