Saltar a contenido

Week 7 - The CPU Scheduler (CFS, EEVDF)

7.1 Conceptual Core

  • The Completely Fair Scheduler (CFS) is a virtual-time, weighted-fair-queueing scheduler: each runnable task accumulates vruntime, and the scheduler picks the task with smallest vruntime. Weight is from nice value.
  • Since Linux 6.6, CFS has been replaced with EEVDF (Earliest Eligible Virtual Deadline First), which provides better latency guarantees while preserving fairness. The userspace API and most of the conceptual model are unchanged.
  • Scheduling classes (priority order, top to bottom): dl (deadline), rt (real-time, FIFO/RR), fair (CFS/EEVDF), idle. Almost everything userspace runs is in fair.

7.2 Mechanical Detail

  • Read kernel/sched/fair.c (modern: kernel/sched/eevdf.c and friends).
  • Per-CPU runqueues; load-balancer migrates tasks between CPUs.
  • sched_setscheduler(2) and chrt userspace tool.
  • CPU affinity: sched_setaffinity(2), taskset, systemd CPUAffinity=.
  • Cgroup cpu controller v2: cpu.weight (proportional), cpu.max (bandwidth).
  • `/proc//sched - per-task scheduler stats.
  • `/proc/sched_debug - system-wide scheduler state.

7.3 Lab-"Scheduler Forensics"

  1. Run two CPU hogs at nice 0. Observe split CPU. Lower one to nice 19, verify ~95/5 split.
  2. Use bpftrace -e 'tracepoint:sched:sched_switch { @[comm] = count() }' to see context-switch rates.
  3. Pin a workload to specific CPUs with taskset -c 0,1. Compare cache-miss rate vs unpinned with perf stat.
  4. Place two services in cgroups with cpu.weight=100 and cpu.weight=1000. Verify the 10:1 split under contention.

7.4 Hardening Drill

  • Forbid SCHED_FIFO/SCHED_RR for non-root with kernel.sched_rt_runtime_us tuning; or use RestrictSUIDSGID=yes and RestrictRealtime=yes in systemd units to prevent privilege escalation via RT scheduling.

7.5 Performance Tuning Slice

  • perf sched record sleep 10; perf sched latency - identifies wakeup-latency outliers. Tunables:sched_wakeup_granularity_ns,sched_min_granularity_ns` (older kernels); EEVDF largely auto-tunes.

Comments