Week 8 - Disk I/O Scheduling, Filesystems Beyond ext4¶
8.1 Conceptual Core¶
- The I/O stack: filesystem → block layer (with merging, sorting) → I/O scheduler → device driver → hardware.
- I/O schedulers (settable per-device in
/sys/block/<dev>/queue/scheduler):none(preferred for NVMe),mq-deadline,kyber,bfq. Choosenonefor fast SSDs,mq-deadlinefor mixed,bfqfor desktop fairness. - Filesystems:
- `ext4 - the default; well-understood; journaled.
- `xfs - high-throughput, parallel metadata; the RHEL default.
- `btrfs - copy-on-write, snapshots, multi-device. Use cautiously for high-throughput.
- `zfs - out-of-tree (CDDL); mature, snapshots, integrity. Heavy memory footprint.
- `tmpfs - RAM-backed.
8.2 Mechanical Detail¶
/sys/block/<dev>/queue/{nr_requests,read_ahead_kb,scheduler,rotational}. Tunable per-device.iostat -xz 1for per-device I/O stats. Watch%util,await,svctm,r/s,w/s.blktrace+bttfor fine-grained I/O timing. Modern alternative:bpftrace'sbiolatency/biosnooprecipes.- Mount options for performance:
- `noatime - don't update access times. Always set on busy filesystems.
discardvs periodic `fstrim - for SSDs; periodic is usually better.- `commit=N - ext4 journal commit interval.
8.3 Lab-"I/O Forensics"¶
- Run
fiowith a representative workload. Measure baseline. - Toggle the I/O scheduler. Re-run. Compare.
- Use
bpftrace -e 'tracepoint:block:block_rq_issue { @[args->comm] = count() }'to see who's hitting the disk. - Mount with vs without
noatimeand measure metadata-write traffic difference.
8.4 Hardening Drill¶
- Set
nodev,nosuid,noexecon/tmp,/home,/var/tmpmounts. Document why each matters.
8.5 Performance Tuning Slice¶
bpftrace -e 'tracepoint:block:block_rq_complete /args->dev == 0x800/ { @ = hist((nsecs - @start[arg0]) / 1000) }'to histogram I/O latencies in microseconds.
Month 2 Capstone Deliverable¶
A memory-and-scheduling/ directory:
1. meminfo-decoder/ - a script that reads/proc/meminfoand outputs a human-readable health report.
2.psi-watcher/ - a daemon that alerts when pressure.memory exceeds a threshold.
3. sched-bench/ - comparing nice-weighted, cgroup-weighted, and pinned workloads.
4.io-tuner/ - a fio harness sweeping I/O-scheduler options on the local disk.
Each comes with a markdown writeup of measurements and tuning conclusions.