08 - Volumes and Storage¶
What this session is¶
About 45 minutes. How pods persist data. PersistentVolumes (PV), PersistentVolumeClaims (PVC), StorageClasses - Kubernetes' way to abstract over the underlying storage (cloud disk, NFS, local disk).
Pod-local storage: ephemeral¶
Containers in a pod can share temporary storage via emptyDir:
spec:
containers:
- name: writer
image: alpine
command: ["sh", "-c", "echo hi > /shared/note && sleep 60"]
volumeMounts:
- name: shared
mountPath: /shared
- name: reader
image: alpine
command: ["sh", "-c", "sleep 10 && cat /shared/note && sleep 60"]
volumeMounts:
- name: shared
mountPath: /shared
volumes:
- name: shared
emptyDir: {}
emptyDir lives as long as the pod. Pod dies → directory and contents are gone.
Useful for: in-memory scratch, cache between containers, log file written by one and shipped by another (sidecar).
Not useful for persistence across pod restarts.
Persistent storage: PV + PVC¶
For data that outlives pods, you need two resources:
- PersistentVolume (PV) - represents a real piece of storage (a cloud disk, an NFS export, a local-disk path). Cluster-scoped.
- PersistentVolumeClaim (PVC) - a request for storage of some size/access mode. Namespaced.
Kubernetes binds a PVC to an appropriate PV. The pod mounts the PVC.
Easier: dynamic provisioning¶
In most modern clusters, you don't create PVs manually. A StorageClass describes how to dynamically provision PVs on demand. Cloud clusters (EKS, GKE, AKS) come with a default StorageClass that creates cloud disks. Local clusters (minikube, kind) have a hostPath-based StorageClass.
You just write a PVC; the cluster creates a PV to satisfy it.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
# storageClassName: standard # optional - uses default if omitted
Apply:
kubectl get pvc shows status. If Bound, you got storage. If Pending, no StorageClass exists or no PVs satisfy.
Use the PVC in a Pod¶
apiVersion: v1
kind: Pod
metadata:
name: db
spec:
containers:
- name: postgres
image: postgres:16
env:
- name: POSTGRES_PASSWORD
value: secret
volumeMounts:
- name: data
mountPath: /var/lib/postgresql/data
volumes:
- name: data
persistentVolumeClaim:
claimName: data
Apply. The pod mounts the PVC at /var/lib/postgresql/data. Data persists across pod deletes/restarts.
Delete the pod, recreate - Postgres still has its data.
Access modes¶
The PVC's accessModes constrains what kind of storage works:
ReadWriteOnce(RWO) - one node can read+write. Most cloud disks support this. Default.ReadOnlyMany(ROX) - many nodes can read.ReadWriteMany(RWX) - many nodes can read+write. NFS supports this; cloud block storage usually doesn't.
Most apps want RWO. RWX is needed only for "many pods write to the same shared filesystem" cases.
StorageClass¶
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast-ssd
provisioner: kubernetes.io/aws-ebs
parameters:
type: gp3
iops: "3000"
reclaimPolicy: Delete # delete the volume when PVC is deleted
volumeBindingMode: WaitForFirstConsumer
The provisioner is cloud-specific. You usually don't write StorageClasses yourself - the cloud or admin sets them up.
List what's available:
The (default) annotation marks the one used when a PVC doesn't specify storageClassName.
Reclaim policy¶
When a PVC is deleted, what happens to the underlying PV?
Delete- PV (and the cloud disk) deleted. Data gone.Retain- PV and disk kept. Admin reclaims manually. Safer for production.
Delete is the default for dynamically provisioned PVs. For data you can't lose, configure Retain and back up.
Resizing¶
Some StorageClasses support resizing (allowVolumeExpansion: true). Increase a PVC's resources.requests.storage and re-apply - the disk and filesystem grow. Beyond beginner; recognize.
StatefulSets (briefly)¶
A StatefulSet is like a Deployment but each pod has a stable name and its own PVC. For databases, message queues, anything that needs identity + storage per replica.
We're not going to cover StatefulSets in depth here. Recognize: when reading manifests, kind: StatefulSet means "pods with stable identities + per-pod storage." When you need to run Postgres or Cassandra on K8s, this is the pattern (usually via a Helm chart that wraps a StatefulSet).
A common pattern: stateful + stateless¶
A typical app:
- Stateless front-ends (web servers, API gateways) → Deployment, no PVCs.
- Stateful databases (Postgres) → StatefulSet, one PVC per replica.
The stateless half scales freely. The stateful half is restricted by storage.
Exercise¶
-
Create a PVC:
-
Use it in a Postgres pod:
-
Delete the pod, recreate, verify data:
-
Cleanup:
What you might wonder¶
"Where does my data actually live?"
On a cloud cluster: an EBS / persistent disk / Azure disk attached to the node running the pod. On local clusters: usually a hostPath under /var/lib/... on the node (your laptop).
"Can two pods share a PVC?" Only if the PVC's accessMode is ROX or RWX. For RWO (the common case), only one pod can mount at a time.
"How do I back up Kubernetes-managed data?" Two layers: the actual underlying storage (cloud snapshots), and the K8s metadata (PVC, PV definitions). Velero is the popular tool. Beyond beginner; mentioned for awareness.
"What's an emptyDir with medium: Memory?"
A tmpfs-backed emptyDir - lives in RAM, doesn't touch disk. Useful for secrets that shouldn't be persisted.
Done¶
- Use
emptyDirfor ephemeral pod-shared storage. - Create a PVC for persistent storage.
- Mount a PVC in a pod.
- Recognize StorageClass as the dynamic-provisioning machinery.
- Distinguish RWO / ROX / RWX access modes.
- Know that StatefulSet is the pattern for per-replica storage.