Skip to content

Week 18 - Observability: slog, pprof, trace, OpenTelemetry

18.1 Conceptual Core

  • The "three pillars"-logs, metrics, traces-map cleanly to four Go tools:
  • Logs: log/slog (stdlib, since 1.21). Structured, context-aware.
  • Metrics: prometheus/client_golang plus `runtime/metrics - derived collectors.
  • Traces: go.opentelemetry.io/otel with an OTLP exporter.
  • Profiles: pprof (CPU, heap, allocs, block, mutex, goroutine).
  • Execution traces: `runtime/trace - the tool when none of the above tell you why a goroutine is slow.
  • Two cross-cutting principles:
  • Correlation: every log line, trace span, and metric label uses the same trace_id and request_id. This requires plumbing through context.Context.
  • Cardinality discipline: never put unbounded values (user IDs, URLs with query strings, request IDs) into metric labels.

18.2 Mechanical Detail-slog

  • slog.Default(), slog.New(handler), slog.With(attrs...). Handlers: JSONHandler, TextHandler, custom.
  • Context-aware logging: derive a logger per request, store in context (one of the few legitimate context.Value uses), retrieve at log sites:
    ctx = ContextWithLogger(ctx, slog.With("request_id", id))
    // ...
    Logger(ctx).Info("processed", "took_ms", elapsed.Milliseconds())
    
  • Sensitive-data redaction: implement a slog.LogValuer on types containing PII; the LogValue() method returns a redacted form. This pushes redaction into the type, not the call site.

18.3 Mechanical Detail-pprof and trace

  • Endpoints: import _ "net/http/pprof" registers handlers on http.DefaultServeMux. Mount on a separate port and gate behind auth-never on the public listener.
  • CPU profile: go tool pprof http://host:6060/debug/pprof/profile?seconds=30. Top-heavy stacks, flame graphs (pprof -http=:0).
  • Heap profile: pprof http://host:6060/debug/pprof/heap. Default mode inuse_space shows live retention; - alloc_objects` shows churn.
  • Block / mutex profiles: runtime.SetBlockProfileRate(1) and runtime.SetMutexProfileFraction(1). Off by default-turn on briefly when investigating.
  • Goroutine profile: ?debug=2 dumps full stacks. Read it line by line when chasing leaks.
  • Execution trace: runtime/trace.Start(w). Captures every G's lifecycle, GC events, syscall durations. Visualize with go tool trace. The most expensive but most informative tool.

18.4 Mechanical Detail-OpenTelemetry

  • SDK setup: otel.SetTracerProvider(...) + otel.SetTextMapPropagator(propagation.TraceContext{}). Use the OTLP gRPC exporter to a local collector.
  • Span creation: ctx, span := tracer.Start(ctx, "operation"); defer span.End(). Always defer span.End() immediately after creation.
  • Span attributes: low-cardinality. Never put PII in attributes.
  • Span events: structured logs attached to a span. Use sparingly; spans-as-logs leads to trace cardinality explosions.
  • Instrumentation libraries: otelhttp, otelgrpc, otelsql. Auto-propagate through standard transports.

18.5 Lab-"Wire the URL Shortener"

Take week 17's URL shortener and add: - slog JSON output with request-scoped logger via context. - /metrics Prometheus endpoint exposing request count, latency histogram, and Go runtime metrics. - OTLP traces exported to a local Jaeger via docker-compose. - /debug/pprof/* on a separate admin port, gated by IP allowlist. - A 30-second runtime/trace capture under load, committed as trace.out with a markdown analysis.

18.6 Idiomatic & golangci-lint Drill

  • forbidigo (forbid fmt.Println outside tests, log.Print* after slog adoption), loggercheck (uniform slog key conventions).

18.7 Production Hardening Slice

  • Add a sampling-based redaction layer at the slog.Handler level: any attribute matching email|password|token regex is redacted. Unit-test the redaction. This is a compliance prerequisite in regulated environments.

Comments