Week 5 - OverlayFS and Storage Drivers¶
5.1 Conceptual Core¶
- A container's rootfs is built by stacking image layers via a union filesystem. The dominant driver on Linux is OverlayFS (in tree since 3.18). Layers are read-only lower dirs; the container's writable space is the upper dir; the visible merged view is the mount target.
- On any write to a file in a lower layer, the file is copied up to the upper layer first (copy-on-write). This is what makes layered images fast to start but slow to write large files modified from a lower layer.
- Other drivers:
aufs(legacy),btrfs(snapshots),zfs(heavy),devicemapper(deprecated),vfs(no CoW; ultra-portable, ultra-slow).
5.2 Mechanical Detail¶
mount -t overlay overlay -o lowerdir=A:B:C,upperdir=U,workdir=W /merged.workdiris required for OverlayFS bookkeeping; it must be on the same filesystem asupperdir.- Whiteouts: a file deleted in the upper relative to the lower is represented by a
char 0,0device file. Listing/diff operations interpret these. - Opaque directories: the
trusted.overlay.opaque="y"xattr marks a dir whose lower contents should be hidden. - The
containerdsnapshotter abstraction: each driver implements aSnapshotterinterface; the snapshotter manages active and committed snapshots, writable layers, etc.
5.3 Lab-"OverlayFS By Hand"¶
- Create three lower dirs with different files. Mount as overlay. Verify merged view.
- Modify a file from the lower; observe copy-up in the upper.
- Delete a lower file from the merged view; observe the whiteout in the upper.
- Reproduce a "container layer": treat your container's tarball-extracted contents as a lower; create a fresh upper; mount; modify; tar up the upper to produce a new layer.
5.4 Hardening Drill¶
- Audit OverlayFS CVEs. The class of "container escape via crafted file in lower layer" has been exploited. Mitigations: rootless mode + user namespaces, or a sandbox layer (gVisor, Kata).
5.5 Production Readiness Slice¶
- Compare OverlayFS, fuse-overlayfs (rootless default), and the kernel's native rootless overlay (since 5.13) for a representative workload. Measure layer-creation, file-write, and read-many performance.