Week 18 - Observability: slog, pprof, trace, OpenTelemetry¶
18.1 Conceptual Core¶
- The "three pillars"-logs, metrics, traces-map cleanly to four Go tools:
- Logs:
log/slog(stdlib, since 1.21). Structured, context-aware. - Metrics:
prometheus/client_golangplus `runtime/metrics - derived collectors. - Traces:
go.opentelemetry.io/otelwith an OTLP exporter. - Profiles:
pprof(CPU, heap, allocs, block, mutex, goroutine). - Execution traces: `runtime/trace - the tool when none of the above tell you why a goroutine is slow.
- Two cross-cutting principles:
- Correlation: every log line, trace span, and metric label uses the same
trace_idandrequest_id. This requires plumbing throughcontext.Context. - Cardinality discipline: never put unbounded values (user IDs, URLs with query strings, request IDs) into metric labels.
18.2 Mechanical Detail-slog¶
slog.Default(),slog.New(handler),slog.With(attrs...). Handlers:JSONHandler,TextHandler, custom.- Context-aware logging: derive a logger per request, store in context (one of the few legitimate
context.Valueuses), retrieve at log sites: - Sensitive-data redaction: implement a
slog.LogValueron types containing PII; theLogValue()method returns a redacted form. This pushes redaction into the type, not the call site.
18.3 Mechanical Detail-pprof and trace¶
- Endpoints:
import _ "net/http/pprof"registers handlers onhttp.DefaultServeMux. Mount on a separate port and gate behind auth-never on the public listener. - CPU profile:
go tool pprof http://host:6060/debug/pprof/profile?seconds=30. Top-heavy stacks, flame graphs (pprof -http=:0). - Heap profile:
pprof http://host:6060/debug/pprof/heap. Default modeinuse_spaceshows live retention; - alloc_objects` shows churn. - Block / mutex profiles:
runtime.SetBlockProfileRate(1)andruntime.SetMutexProfileFraction(1). Off by default-turn on briefly when investigating. - Goroutine profile:
?debug=2dumps full stacks. Read it line by line when chasing leaks. - Execution trace:
runtime/trace.Start(w). Captures every G's lifecycle, GC events, syscall durations. Visualize withgo tool trace. The most expensive but most informative tool.
18.4 Mechanical Detail-OpenTelemetry¶
- SDK setup:
otel.SetTracerProvider(...)+otel.SetTextMapPropagator(propagation.TraceContext{}). Use the OTLP gRPC exporter to a local collector. - Span creation:
ctx, span := tracer.Start(ctx, "operation"); defer span.End(). Alwaysdefer span.End()immediately after creation. - Span attributes: low-cardinality. Never put PII in attributes.
- Span events: structured logs attached to a span. Use sparingly; spans-as-logs leads to trace cardinality explosions.
- Instrumentation libraries:
otelhttp,otelgrpc,otelsql. Auto-propagate through standard transports.
18.5 Lab-"Wire the URL Shortener"¶
Take week 17's URL shortener and add:
- slog JSON output with request-scoped logger via context.
- /metrics Prometheus endpoint exposing request count, latency histogram, and Go runtime metrics.
- OTLP traces exported to a local Jaeger via docker-compose.
- /debug/pprof/* on a separate admin port, gated by IP allowlist.
- A 30-second runtime/trace capture under load, committed as trace.out with a markdown analysis.
18.6 Idiomatic & golangci-lint Drill¶
forbidigo(forbidfmt.Printlnoutside tests,log.Print*afterslogadoption),loggercheck(uniformslogkey conventions).
18.7 Production Hardening Slice¶
- Add a sampling-based redaction layer at the
slog.Handlerlevel: any attribute matchingemail|password|tokenregex is redacted. Unit-test the redaction. This is a compliance prerequisite in regulated environments.