Week 7 - The CPU Scheduler (CFS, EEVDF)¶
7.1 Conceptual Core¶
- The Completely Fair Scheduler (CFS) is a virtual-time, weighted-fair-queueing scheduler: each runnable task accumulates
vruntime, and the scheduler picks the task with smallestvruntime. Weight is fromnicevalue. - Since Linux 6.6, CFS has been replaced with EEVDF (Earliest Eligible Virtual Deadline First), which provides better latency guarantees while preserving fairness. The userspace API and most of the conceptual model are unchanged.
- Scheduling classes (priority order, top to bottom):
dl(deadline),rt(real-time, FIFO/RR),fair(CFS/EEVDF),idle. Almost everything userspace runs is infair.
7.2 Mechanical Detail¶
- Read
kernel/sched/fair.c(modern:kernel/sched/eevdf.cand friends). - Per-CPU runqueues; load-balancer migrates tasks between CPUs.
sched_setscheduler(2)andchrtuserspace tool.- CPU affinity:
sched_setaffinity(2),taskset, systemdCPUAffinity=. - Cgroup cpu controller v2:
cpu.weight(proportional),cpu.max(bandwidth). - `/proc/
/sched - per-task scheduler stats. - `/proc/sched_debug - system-wide scheduler state.
7.3 Lab-"Scheduler Forensics"¶
- Run two CPU hogs at
nice 0. Observe split CPU. Lower one tonice 19, verify ~95/5 split. - Use
bpftrace -e 'tracepoint:sched:sched_switch { @[comm] = count() }'to see context-switch rates. - Pin a workload to specific CPUs with
taskset -c 0,1. Compare cache-miss rate vs unpinned withperf stat. - Place two services in cgroups with
cpu.weight=100andcpu.weight=1000. Verify the 10:1 split under contention.
7.4 Hardening Drill¶
- Forbid SCHED_FIFO/SCHED_RR for non-root with
kernel.sched_rt_runtime_ustuning; or useRestrictSUIDSGID=yesandRestrictRealtime=yesin systemd units to prevent privilege escalation via RT scheduling.
7.5 Performance Tuning Slice¶
perf sched record sleep 10; perf sched latency - identifies wakeup-latency outliers. Tunables:sched_wakeup_granularity_ns,sched_min_granularity_ns` (older kernels); EEVDF largely auto-tunes.