Skip to content

Week 16 - CSI at Scale: Snapshots, Backup, Cloning

16.1 Conceptual Core

  • Production storage in K8s requires:
  • Dynamic provisioning (week 8).
  • Volume Snapshots (point-in-time captures).
  • Backups (off-cluster, often app-consistent via operator hooks).
  • Cloning (PVC from snapshot, or PVC-from-PVC).
  • Resizing (online expansion).
  • Velero is the de-facto cluster backup tool: backs up resource manifests + PV snapshots to object storage; restores selectively.

16.2 Mechanical Detail

  • VolumeSnapshotClassVolumeSnapshotVolumeSnapshotContent. Mirrors the SC/PVC/PV trio.
  • The external-snapshotter sidecar runs alongside the CSI controller, watching VolumeSnapshot objects.
  • Volume populators (since 1.24+)-populate a new PVC from arbitrary sources (snapshots, other PVCs, S3, etc.). Modular framework.
  • Velero: install, configure storage location (S3-compatible bucket), schedule backups via Schedule resource. Plugins for cloud providers and for "BackupStorageLocation" abstraction.

16.3 Lab-"Backup and Restore"

  1. Install Velero against a MinIO bucket.
  2. Schedule a daily backup of one namespace.
  3. Delete the namespace; restore from backup; verify Pods come back, PVs reattach, data intact.
  4. Create a stateful workload (Postgres via an operator); test snapshot + clone flow for fast dev/test environment provisioning.

16.4 Hardening Drill

  • Test restore into a different cluster. This is the actual disaster-recovery scenario, and the most commonly broken backup story.

16.5 Operations Slice

  • Wire Velero metrics: backup success rate, backup duration, restore-test outcomes (run a synthetic restore weekly to validate).

Month 4 Capstone Deliverable

A networking-and-storage/ workspace: 1. cni-source-walkthrough.md (week 13). 2. cilium-policies/ - L4 + L7 + identity-based examples (week 14). 3.mesh-comparison/ - three meshes, RED dashboards (week 15). 4. `velero-DR/ - backup, restore, and cross-cluster-restore demos (week 16).

Comments