08 - Processes¶
What this session is¶
About 45 minutes. You'll learn what a process is, how to see what's running, how to kill misbehaving programs, and how to manage background jobs.
What's a process¶
Every running program is a process with: - A PID (process ID) - a unique number. - A user - the user who started it. - Resources - open files, memory, network sockets. - A state - running, sleeping, waiting, zombie.
When you run a command, the shell creates a child process to execute it. When the command finishes, the process exits.
See what's running: ps¶
ps # processes in your current terminal
ps aux # ALL processes, with detailed info
ps aux | grep python # only Python processes
ps aux is the most-used form. Columns:
| Column | Meaning |
|---|---|
| USER | who started it |
| PID | process ID |
| %CPU | CPU usage |
| %MEM | memory usage |
| VSZ / RSS | virtual / resident memory (KB) |
| TTY | terminal it's attached to (? if none) |
| STAT | state (R running, S sleeping, Z zombie, T stopped) |
| START / TIME | when started / cumulative CPU time |
| COMMAND | what's running |
Live view: top and htop¶
top is the classic interactive process viewer:
Press q to quit. M to sort by memory. P by CPU. 1 to show per-CPU breakdown.
htop is the modern, prettier alternative. Install: - sudo apt install htop - brew install htop
F10 or q to quit. Use arrows to scroll; F9 to kill (with menu). Much easier than top.
btop (newer) is similar - colorful, more visual. Try it: sudo apt install btop / brew install btop.
Killing a process¶
kill PID # send the default signal (TERM - polite request to exit)
kill -9 PID # send SIGKILL - force kill (process can't ignore)
killall name # kill all processes named "name"
pkill name # similar
The two signals to know: - SIGTERM (15) - the default. "Please exit cleanly." The process can save state, close files, then exit. - SIGKILL (9) - "Die now." The kernel terminates the process immediately. No cleanup. Use only when SIGTERM doesn't work.
Running things in the background¶
Add & to run a command in the background, freeing your terminal:
Your terminal returns. The process runs.
To see your background jobs in this shell:
To bring a background job to the foreground:
To send a foreground job to the background: - Press Ctrl-Z to suspend (pause it). - Type bg to continue it in the background.
To kill a job:
Long-running processes that survive logout¶
Jobs started with & die when you log out. For things that should survive:
Output goes to nohup.out by default.
The modern alternative: tmux or screen - terminal multiplexers. Start a session, run things in it, detach, log out, come back hours later, reattach. Out of scope here; install and learn one - tmux is the more popular.
For production services: use systemd units (/etc/systemd/system/myservice.service). Beyond beginner; mentioned for recognition.
Process tree: pstree¶
Shows parent-child relationships visually. Useful for understanding what spawned what.
Why a process won't die¶
Sometimes kill PID doesn't work: 1. The process is in uninterruptible sleep (waiting on disk or kernel - state D in ps). Wait it out; reboot if persistent. 2. The process is a child of another, and the parent ignores SIGCHLD. Kill the parent. 3. You don't own the process. sudo kill PID if you must.
Last resort always: sudo kill -9 PID.
Going deeper¶
The commands above handle 95% of process work. This section is the depth that turns "I can run ps" into "I can diagnose what's wrong with a process" - the failure modes you'll actually hit, with what you'll see.
Process states - and the one that means trouble¶
The STAT column in ps aux is a status code most people ignore. Each letter is a real diagnosis:
$ ps aux | awk '{print $8, $11}' | head
STAT COMMAND
S /usr/bin/bash # S = interruptible sleep (waiting for something, normal - most procs)
R python train.py # R = running or runnable (actually using CPU right now)
D cp bigfile /mnt # D = UNINTERRUPTIBLE sleep (stuck in a kernel/IO operation)
Z [defunct] # Z = zombie (finished, but parent hasn't collected it)
T vim # T = stopped (you hit Ctrl-Z, or it got SIGSTOP)
The two that signal trouble:
-
D(uninterruptible sleep) - the process is stuck inside a kernel operation, almost always waiting on slow or hung I/O (a dead NFS mount, a failing disk). Here's the kicker: aD-state process cannot be killed, not even withkill -9- it's not ignoring your signal, it's not running user code at all, it's blocked in the kernel. If you see a process stuck inD, the problem is the I/O it's waiting on (checkdmesgfor disk errors, check the mount). The process will come back when the I/O completes or errors out; if it never does, only a reboot clears it. SeeingDand knowing "that's stuck I/O, not a killable process" saves you from fruitlessly hammeringkill -9. -
Z(zombie) - covered next.
Zombies - what they are and why kill -9 won't clear them¶
A zombie is a process that has already exited but whose parent hasn't called wait() to read its exit status. It's not running - it's a corpse holding a slot in the process table (just a PID and exit code, no memory). You'll see them as <defunct>:
The trap everyone falls into: you cannot kill a zombie. kill -9 4823 does nothing - you can't kill something that's already dead. The fix is counterintuitive: the zombie is the parent's fault (the parent failed to reap it), so you signal or kill the parent. Find the parent:
$ ps -o ppid= -p 4823 # get the zombie's parent PID
3401
$ ps -p 3401 # who's the negligent parent?
PID TTY CMD
3401 ? my_buggy_server
$ kill 3401 # killing/signaling the parent makes init reap the zombie
When the parent dies (or handles SIGCHLD), the zombie gets reaped by init (PID 1) and vanishes. A few transient zombies are normal; thousands accumulating means a buggy parent program that never reaps children - a real bug to report or fix. This is the kind of thing that looks like a mystery ("why won't this dead process go away?") until you know zombies are reaped through the parent.
What you'll actually see: a runaway process¶
The most common real incident: something pins a CPU core at 100%. Here's the diagnosis, with output:
$ top -o %CPU # sort by CPU, the runaway floats to the top
PID USER %CPU %MEM COMMAND
5012 user 99.7 2.1 node <- one core fully pegged
99.7% means one core saturated (on an 8-core box, top can show up to 800% for a process using all cores). Before you kill it, see what it's doing - is it a real workload or a stuck loop? Attach strace (the syscall investigation, if you've seen the senior path) or check /proc:
$ cat /proc/5012/wchan # what kernel function it's waiting in (empty = actively running, not waiting)
$ ls -l /proc/5012/fd # what files/sockets it has open - hints at what it's doing
A process at 100% CPU with an empty wchan is genuinely computing (maybe a real job, maybe an infinite loop). One that's at 100% but stuck is rarer. This is the difference between "kill it blindly" and "understand it, then decide."
Signals are a vocabulary, not just kill -9¶
kill sends signals, and the signal you send matters:
kill -TERM PID # (= plain `kill`) SIGTERM: "please shut down" - the process can clean up
kill -INT PID # SIGINT: same as Ctrl-C - interrupt
kill -HUP PID # SIGHUP: many servers reload their config on this (no restart!)
kill -KILL PID # (= kill -9) SIGKILL: cannot be caught or ignored, instant death, NO cleanup
kill -STOP PID # pause the process (resume with -CONT)
kill -CONT PID # resume a stopped process
The discipline: always try SIGTERM (plain kill) first. It lets the process flush buffers, close files, finish a transaction, and exit cleanly. kill -9 (SIGKILL) is the sledgehammer - it can't be caught, so the process dies immediately with no chance to clean up, which can corrupt files or leave locks held. Reach for -9 only when SIGTERM is ignored. Bonus: kill -HUP reloads many daemons' config without downtime - kill -HUP $(pgrep nginx) re-reads nginx config live.
Try it (with what you'll see)¶
- Make a zombie: run
bash -c 'sleep 1 & exec sleep 5'- briefly creates a child whose parent is busy. Inspect withps aux | grep defunct. (Or write a tiny program that forks and neverwait()s.) Confirmkill -9on the zombie does nothing, and killing the parent clears it. - Find a
D-state process if you can (rundd if=/dev/sda of=/dev/nullas root briefly - it may flashD). Note you can't reliablykill -9it mid-I/O. - Spin a runaway:
python3 -c 'while True: pass' &, find it intop -o %CPUat ~100%, check/proc/<pid>/wchan(empty = actively running), thenkillit (SIGTERM) and confirm it dies cleanly. - Send
kill -STOPthenkill -CONTto a process and watch its STAT flipTthen back toS/R.
Exercise¶
-
List your processes:
-
Count Python processes on your system:
(Note: this counts the grep itself too.ps aux | grep python | grep -v grep | wc -lto exclude.) -
Start a sleep in the background:
Note the PID. Bring it back to the foreground: Press Ctrl-Z to suspend, thenbgto continue background, then kill it: -
Launch
htop(ortop). Sort by memory (in htop, F6 → MEM%; in top, press M). Find the biggest process. Quit. -
Bonus: find what process is using a given file:
Or what's listening on a port:
What you might wonder¶
"What's a zombie process?" A process that has finished but whose exit status hasn't been reaped by its parent. It still has a PID but no resources. Mostly harmless; reflects a buggy parent. Reboot clears them.
"Why does kill need a number? What are all the signals?" kill -l shows them all. The common ones: - 1 SIGHUP - terminal hangup; often triggers reload in daemons. - 2 SIGINT - what Ctrl-C sends. - 9 SIGKILL - uncatchable. - 15 SIGTERM - polite termination. - 19 SIGSTOP / 18 SIGCONT - pause / resume.
"How do I run a job on a schedule?" cron (Linux/macOS) or systemd timers. crontab -e to edit your scheduled jobs. Out of scope here; useful to know it exists.
"What's the relationship between processes and threads?" A process can have multiple threads (lighter-weight units of execution within the process, sharing memory). For most beginner tasks, this distinction doesn't matter. ps -L shows threads if you need to see them.
Done¶
- Inspect processes with
ps aux,top,htop. - Kill processes with
kill,killall,pkill. - Distinguish SIGTERM (15) from SIGKILL (9).
- Run jobs in the background (
&,jobs,fg,bg). - Keep jobs alive past logout (
nohup,tmux).
Next: Shell scripting basics →