Skip to content

08 - Volumes and Storage

What this session is

About 45 minutes. How pods persist data. PersistentVolumes (PV), PersistentVolumeClaims (PVC), StorageClasses - Kubernetes' way to abstract over the underlying storage (cloud disk, NFS, local disk).

Pod-local storage: ephemeral

Containers in a pod can share temporary storage via emptyDir:

spec:
  containers:
  - name: writer
    image: alpine
    command: ["sh", "-c", "echo hi > /shared/note && sleep 60"]
    volumeMounts:
    - name: shared
      mountPath: /shared
  - name: reader
    image: alpine
    command: ["sh", "-c", "sleep 10 && cat /shared/note && sleep 60"]
    volumeMounts:
    - name: shared
      mountPath: /shared
  volumes:
  - name: shared
    emptyDir: {}

emptyDir lives as long as the pod. Pod dies → directory and contents are gone.

Useful for: in-memory scratch, cache between containers, log file written by one and shipped by another (sidecar).

Not useful for persistence across pod restarts.

Persistent storage: PV + PVC

For data that outlives pods, you need two resources:

  • PersistentVolume (PV) - represents a real piece of storage (a cloud disk, an NFS export, a local-disk path). Cluster-scoped.
  • PersistentVolumeClaim (PVC) - a request for storage of some size/access mode. Namespaced.

Kubernetes binds a PVC to an appropriate PV. The pod mounts the PVC.

Easier: dynamic provisioning

In most modern clusters, you don't create PVs manually. A StorageClass describes how to dynamically provision PVs on demand. Cloud clusters (EKS, GKE, AKS) come with a default StorageClass that creates cloud disks. Local clusters (minikube, kind) have a hostPath-based StorageClass.

You just write a PVC; the cluster creates a PV to satisfy it.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  # storageClassName: standard      # optional - uses default if omitted

Apply:

kubectl apply -f pvc.yaml
kubectl get pvc

kubectl get pvc shows status. If Bound, you got storage. If Pending, no StorageClass exists or no PVs satisfy.

Use the PVC in a Pod

apiVersion: v1
kind: Pod
metadata:
  name: db
spec:
  containers:
  - name: postgres
    image: postgres:16
    env:
    - name: POSTGRES_PASSWORD
      value: secret
    volumeMounts:
    - name: data
      mountPath: /var/lib/postgresql/data
  volumes:
  - name: data
    persistentVolumeClaim:
      claimName: data

Apply. The pod mounts the PVC at /var/lib/postgresql/data. Data persists across pod deletes/restarts.

Delete the pod, recreate - Postgres still has its data.

Access modes

The PVC's accessModes constrains what kind of storage works:

  • ReadWriteOnce (RWO) - one node can read+write. Most cloud disks support this. Default.
  • ReadOnlyMany (ROX) - many nodes can read.
  • ReadWriteMany (RWX) - many nodes can read+write. NFS supports this; cloud block storage usually doesn't.

Most apps want RWO. RWX is needed only for "many pods write to the same shared filesystem" cases.

StorageClass

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp3
  iops: "3000"
reclaimPolicy: Delete           # delete the volume when PVC is deleted
volumeBindingMode: WaitForFirstConsumer

The provisioner is cloud-specific. You usually don't write StorageClasses yourself - the cloud or admin sets them up.

List what's available:

kubectl get storageclass

The (default) annotation marks the one used when a PVC doesn't specify storageClassName.

Reclaim policy

When a PVC is deleted, what happens to the underlying PV?

  • Delete - PV (and the cloud disk) deleted. Data gone.
  • Retain - PV and disk kept. Admin reclaims manually. Safer for production.

Delete is the default for dynamically provisioned PVs. For data you can't lose, configure Retain and back up.

Resizing

Some StorageClasses support resizing (allowVolumeExpansion: true). Increase a PVC's resources.requests.storage and re-apply - the disk and filesystem grow. Beyond beginner; recognize.

StatefulSets (briefly)

A StatefulSet is like a Deployment but each pod has a stable name and its own PVC. For databases, message queues, anything that needs identity + storage per replica.

We're not going to cover StatefulSets in depth here. Recognize: when reading manifests, kind: StatefulSet means "pods with stable identities + per-pod storage." When you need to run Postgres or Cassandra on K8s, this is the pattern (usually via a Helm chart that wraps a StatefulSet).

A common pattern: stateful + stateless

A typical app:

  • Stateless front-ends (web servers, API gateways) → Deployment, no PVCs.
  • Stateful databases (Postgres) → StatefulSet, one PVC per replica.

The stateless half scales freely. The stateful half is restricted by storage.

Exercise

  1. Create a PVC:

    kubectl apply -f pvc.yaml
    kubectl get pvc                # should be Bound (or Pending if no StorageClass)
    

  2. Use it in a Postgres pod:

    kubectl apply -f db-pod.yaml
    kubectl exec -it db -- psql -U postgres -c "CREATE TABLE notes (text TEXT);"
    kubectl exec -it db -- psql -U postgres -c "INSERT INTO notes VALUES ('hello');"
    

  3. Delete the pod, recreate, verify data:

    kubectl delete pod db
    kubectl apply -f db-pod.yaml
    kubectl exec -it db -- psql -U postgres -c "SELECT * FROM notes;"
    # 'hello' is still there
    

  4. Cleanup:

    kubectl delete pod db
    kubectl delete pvc data           # deletes the underlying PV too (with Delete reclaim policy)
    

What you might wonder

"Where does my data actually live?" On a cloud cluster: an EBS / persistent disk / Azure disk attached to the node running the pod. On local clusters: usually a hostPath under /var/lib/... on the node (your laptop).

"Can two pods share a PVC?" Only if the PVC's accessMode is ROX or RWX. For RWO (the common case), only one pod can mount at a time.

"How do I back up Kubernetes-managed data?" Two layers: the actual underlying storage (cloud snapshots), and the K8s metadata (PVC, PV definitions). Velero is the popular tool. Beyond beginner; mentioned for awareness.

"What's an emptyDir with medium: Memory?" A tmpfs-backed emptyDir - lives in RAM, doesn't touch disk. Useful for secrets that shouldn't be persisted.

Done

  • Use emptyDir for ephemeral pod-shared storage.
  • Create a PVC for persistent storage.
  • Mount a PVC in a pod.
  • Recognize StorageClass as the dynamic-provisioning machinery.
  • Distinguish RWO / ROX / RWX access modes.
  • Know that StatefulSet is the pattern for per-replica storage.

Next: Ingress →

Comments