Week 3 - The Virtual File System (VFS)¶
3.1 Conceptual Core¶
- The VFS is the kernel's abstraction over filesystem implementations. Userspace sees one consistent API (
open,read,stat,mmap); the kernel dispatches to ext4, btrfs, xfs, tmpfs, procfs, sysfs, fuse via per-FS operation tables. - Four core VFS objects:
- inode-a file's metadata (owner, perms, size, pointers to data blocks).
- dentry-a directory entry; the cached mapping from a name to an inode.
- file-an open file description (per
open()call); holds offset, flags, ref count. - superblock-a mounted filesystem instance.
- The dentry cache (
dcache) and inode cache (icache) are why repeatedstat()s are fast.
3.2 Mechanical Detail¶
- Read
fs/open.c::do_sys_openat2 - the entry point ofopenat(2)`. path_openatresolves the path through the dcache, allocating new dentries on miss.- Each FS implements a
struct file_operationsandstruct inode_operations. ext4's are infs/ext4/file.c,fs/ext4/inode.c. - Mount namespaces (preview)-each mount namespace has its own mount tree. Containers exploit this.
- Pseudo-filesystems:
procfs(/proc)-kernel-introspection:/proc/<pid>/,/proc/cpuinfo,/proc/meminfo,/proc/sys/.sysfs(/sys)-device/driver-introspection, with most kernel tunables under/sys/kernel/,/sys/class/,/sys/block/,/sys/fs/cgroup/.cgroupfs,devtmpfs,tmpfs,bpf,tracefs,debugfs,securityfs.
3.3 Lab-"VFS Forensics"¶
- Catalogue every entry in
/proc/<pid>/for one of your processes. Document what each gives. - Read
/proc/<pid>/mapsand explain every region (text, heap, stack, vdso, vvar, shared libs). - Use
eBPF'svfs_openkprobe (viabpftrace) to log everyopensystem-wide for 5 seconds. Triage the noise. - Mount
tmpfsat a custom path, fill it, and observe the allocator behavior in/proc/meminfo(Shmem).
3.4 Hardening Drill¶
- Lock down
/procwithhidepid=2(mount option). Verify a non-root user can no longer see other users' processes.
3.5 Performance Tuning Slice¶
- Use
perf trace -Fto find the hottest VFS function on your workload. If it's__d_lookup, your dcache is being thrashed; if it's__find_get_block, your buffer cache is. Document the inference.