Observability¶
Why it matters¶
Every path's APPENDIX_A covers production hardening; every one of those appendices is, at heart, an observability chapter. The thesis is constant across runtimes: always-on profiling beats reactive debugging, and the tooling exists to make it cheap. The differences are in which tool fits each runtime and what each one teaches you to look at.
The lens, per path¶
Go - pprof everywhere¶
Appendix A - Production Hardening. net/http/pprof exposes CPU, heap, goroutine, mutex, and block profiles over HTTP. go tool pprof consumes them. runtime/trace is the event-trace equivalent - scheduler latency, GC events, goroutine lifecycle.
The unique thing here: pprof is in the standard library. Every Go service ships with it for the cost of one import. No agent install, no separate binary. The expected diagnostic workflow is curl /debug/pprof/profile?seconds=30 | pprof -http=:8080.
Pair with: the -race detector (development) and Go's structured log/slog (since 1.21).
Java - JFR + async-profiler + JMC¶
Appendix A - Production Hardening. Java Flight Recorder (free since 11) for ~1%-overhead always-on profiling; async-profiler for CPU/alloc/lock flame graphs; Eclipse MAT for heap-dump triage; JMH for benchmarks; jcmd for everything else.
The unique thing here: the deepest production-profiling toolchain of any mainstream runtime, by a wide margin. JFR's continuous-recording pattern (rotate to a ring buffer, snapshot on alert) is the gold standard most other ecosystems are still chasing.
Pair with: Micrometer + OpenTelemetry for metrics and traces (Java Month 5).
Python - py-spy + scalene + cProfile¶
Appendix A - Production Hardening. py-spy for sampling-based CPU profiling that attaches to a running process (no code change); scalene for line-level CPU + memory; cProfile + snakeviz for deterministic profiles; tracemalloc for allocation tracking.
The unique thing here: the no-instrumentation story. py-spy attaches to PID and produces flame graphs without restarting the process. For Python - where a restart costs minutes of warmup in any ML or web framework - this is decisive.
Pair with: OpenTelemetry's Python SDK (excellent), and viztracer for chrome-trace-format event traces.
Linux - perf + eBPF + bpftrace¶
Appendix A - Hardening & Tuning. perf for hardware counters (cache misses, branch mispredicts, LLC traffic); eBPF for safe in-kernel programmable tracing; bpftrace for ad-hoc one-liners; ftrace for the legacy function-tracing path.
The unique thing here: observability across runtimes. perf and eBPF see everything - the kernel, user-space stacks, GPU drivers, container runtimes. When a Go service slows down and pprof shows clean CPU, perf will show the L3 cache miss perf counter spike pprof can't.
Pair with: bcc (BPF Compiler Collection) tools (opensnoop, execsnoop, tcpconnect, ...) for canned diagnostic scripts.
Kubernetes - metrics-server, kube-state-metrics, k8s events¶
Appendix A - Hardening. The cluster-level observability stack: metrics-server for live resource use; kube-state-metrics for object state as Prometheus metrics; k8s events as a first-class debugging signal; the kubelet's /metrics endpoint for node-level data.
The unique thing here: observability across a fleet, not a process. You're aggregating across 1k pods, not profiling one.
Pair with: OpenTelemetry collector as the cluster-wide ingestion point; Prometheus + Grafana + Tempo + Loki as the canonical OSS stack.
Other paths¶
- Rust (Appendix A) -
tracing+tracing-subscriber,tokio-console,criterionfor benchmarks,flamegraph(a thin wrapper over Linux perf). - Containers (Appendix A) - container-runtime tracing (runc's debug output, containerd events), cgroup metrics from the filesystem directly.
- AI Systems (Appendix A) - NVIDIA Nsight Systems + Nsight Compute, DCGM for GPU metrics, PyTorch profiler, GPU utilization vs. SM occupancy (not the same thing).
The unifying patterns¶
Across every path, the same three patterns recur:
- Always-on profiling, snapshot on alert. JFR ring buffer, continuous Go pprof, eBPF probes running in production. Cost: ~1-3% overhead. Value: when an incident fires, you already have the data.
- Three pillars are not three databases. Logs, metrics, traces should share a correlation ID (trace ID) so they join on query. Every modern stack lands here.
- The flame graph is the right default visualization. CPU, allocations, lock contention, off-CPU/wall-clock time - all of them are flame graphs. Brendan Gregg won.
What to read first¶
- You operate a JVM fleet → Java Appendix A end-to-end, then JMC tutorials. Most directly applicable, deepest tooling.
- You write Go services → Go Appendix A, then bookmark Brendan Gregg's
perf-toolsfor when pprof isn't enough. - You write Python at scale → py-spy first (zero-setup wins), then OpenTelemetry instrumentation.
- You debug across runtimes → Linux Appendix A. perf and eBPF see what runtime-specific tools cannot.
- You operate Kubernetes → Kubernetes Appendix A, then the relevant per-runtime appendix for whatever workload is misbehaving.