Week 16 - CSI at Scale: Snapshots, Backup, Cloning¶
16.1 Conceptual Core¶
- Production storage in K8s requires:
- Dynamic provisioning (week 8).
- Volume Snapshots (point-in-time captures).
- Backups (off-cluster, often app-consistent via operator hooks).
- Cloning (PVC from snapshot, or PVC-from-PVC).
- Resizing (online expansion).
- Velero is the de-facto cluster backup tool: backs up resource manifests + PV snapshots to object storage; restores selectively.
16.2 Mechanical Detail¶
VolumeSnapshotClass↔VolumeSnapshot↔VolumeSnapshotContent. Mirrors the SC/PVC/PV trio.- The external-snapshotter sidecar runs alongside the CSI controller, watching
VolumeSnapshotobjects. - Volume populators (since 1.24+)-populate a new PVC from arbitrary sources (snapshots, other PVCs, S3, etc.). Modular framework.
- Velero: install, configure storage location (S3-compatible bucket), schedule backups via
Scheduleresource. Plugins for cloud providers and for "BackupStorageLocation" abstraction.
16.3 Lab-"Backup and Restore"¶
- Install Velero against a MinIO bucket.
- Schedule a daily backup of one namespace.
- Delete the namespace; restore from backup; verify Pods come back, PVs reattach, data intact.
- Create a stateful workload (Postgres via an operator); test snapshot + clone flow for fast dev/test environment provisioning.
16.4 Hardening Drill¶
- Test restore into a different cluster. This is the actual disaster-recovery scenario, and the most commonly broken backup story.
16.5 Operations Slice¶
- Wire Velero metrics: backup success rate, backup duration, restore-test outcomes (run a synthetic restore weekly to validate).
Month 4 Capstone Deliverable¶
A networking-and-storage/ workspace:
1. cni-source-walkthrough.md (week 13).
2. cilium-policies/ - L4 + L7 + identity-based examples (week 14).
3.mesh-comparison/ - three meshes, RED dashboards (week 15).
4. `velero-DR/ - backup, restore, and cross-cluster-restore demos (week 16).