Week 15 - Service Meshes: Istio, Linkerd, Cilium Service Mesh¶
15.1 Conceptual Core¶
- A service mesh adds: mTLS between Services, retries/timeouts/circuit-breaking, traffic shifting (canary, blue/green), observability (RED metrics + traces), policy enforcement.
- Two architectural patterns:
- Sidecar (Istio classic, Linkerd)-Envoy/
linkerd-proxyruns in every Pod. ~50 MB memory per Pod, ~1 ms latency overhead. - Sidecar-less (Istio ambient, Cilium SM)-eBPF + per-node proxy. Much lower per-Pod overhead.
- Decision matrix:
- Mature, full-featured, complex → Istio.
- Minimalist, Rust-based, fast to install → Linkerd.
- Already running Cilium, want sidecar-less → Cilium Service Mesh.
15.2 Mechanical Detail¶
- Envoy (under Istio + others) is the dataplane proxy. xDS APIs (LDS, RDS, CDS, EDS) push config from the control plane.
- mTLS rotation: the mesh control plane issues short-lived certs (typically 24h) signed by an internal CA (or SPIFFE-compatible).
- Traffic management: Istio
VirtualService+DestinationRulefor routing rules. K8s Gateway API is the standard-track replacement, supported by all major meshes. - Observability: every mesh emits RED metrics (Rate, Errors, Duration) per-service. With OTel, traces propagate through the mesh.
15.3 Lab-"Three Meshes"¶
- Install Istio in ambient mode on a test cluster. Apply a
VirtualServicethat does 90/10 canary routing. Verify with Hubble or Kiali. - Repeat with Linkerd. Compare install footprint, configuration ergonomics, and observability quality.
- (If running Cilium) enable Cilium Service Mesh. Compare again.
- Document tradeoffs: install effort, per-Pod overhead, feature gaps.
15.4 Hardening Drill¶
- Enable mTLS in
STRICTmode. DefineAuthorizationPolicys denying cross-namespace traffic by default; allow only intended pairs.
15.5 Operations Slice¶
- Wire the mesh's RED metrics into your service-level dashboards. Define SLOs per service: latency p99, error rate, mTLS handshake success rate.