07 - Pipes and Redirection¶

What this session is¶

About 45 minutes. You'll learn pipes (|) and redirection (>, >>, <) - the most powerful feature of the Unix shell. Once you internalize them, you can compose small commands into solutions to problems no individual command would solve.

The Unix philosophy¶

Write programs that do one thing well. Write programs to work together. Write programs to handle text streams.

Every command has three default channels: - stdin (standard input) - where the command reads from. - stdout (standard output) - where it writes results. - stderr (standard error) - where it writes errors.

By default, stdin is your keyboard; stdout and stderr are your terminal. Pipes and redirection let you wire them differently.

Pipe: `|`¶

The pipe sends one command's stdout into another's stdin:

ls | wc -l

ls produces a list of files, one per line.
wc -l counts lines.
Result: the number of files in the current directory.

You can chain as many pipes as you want:

ps aux | grep python | wc -l

ps aux - list all running processes.
grep python - keep only lines containing "python".
wc -l - count them.
Result: how many Python processes are running.

Read pipelines left to right. Each | is "and then send through."

Redirecting output: `>` and `>>`¶

Send stdout to a file:

ls > files.txt              # write list to file (overwrites existing)
ls >> files.txt             # append to file (creates if needed)
echo "hello" > greeting.txt
date >> log.txt             # append today's date to log.txt

> overwrites - if files.txt existed, it's now replaced. >> appends - keeps existing content, adds to the end.

For commands that produce a lot of output you want to save:

find / -name "*.log" > all-logs.txt 2>/dev/null

2>/dev/null redirects stderr to "nothing" - suppresses error messages from directories you can't read.

Redirecting input: `<`¶

Send a file's contents to a command's stdin:

sort < unsorted.txt > sorted.txt

sort reads from stdin (here, unsorted.txt) and writes sorted lines to stdout (here, sorted.txt).

In practice you rarely use < because most commands accept a filename argument too: sort unsorted.txt > sorted.txt works the same. But < is useful when a command only reads stdin.

`tee`: split output¶

tee writes to a file AND to stdout, so the pipeline continues:

ls -la | tee files.txt | wc -l

ls -la lists files.
tee files.txt writes the list to a file AND passes it on.
wc -l counts.
Result: file count, and files.txt also has the listing.

Useful when you want to save intermediate output without breaking the pipeline.

Combining: real examples¶

Count how many .py files are in a directory tree:

find . -name "*.py" | wc -l

Find the 5 largest files in your home:

du -ah ~ | sort -h | tail -n 5

du -ah ~ - disk usage for each file under home (human-readable).
sort -h - sort by human-readable size.
tail -n 5 - keep the last 5 (largest).

Find the most-used commands in your shell history:

history | awk '{print $2}' | sort | uniq -c | sort -rn | head -n 10

history - your command history.
awk '{print $2}' - second word of each line (the command itself).
sort - sort alphabetically.
uniq -c - collapse duplicates and count.
sort -rn - sort numerically, reversed (biggest first).
head -n 10 - top 10.

You don't need to understand every piece yet. Notice: a complex task is solved by piping simple tools together. That's the Unix way.

Save all warnings from a log file to a separate file:

grep "WARNING" /var/log/syslog > warnings.txt

Watch log files for errors as they happen:

tail -f /var/log/syslog | grep -i error

stderr vs stdout¶

Some commands write errors separately. Compare:

ls /nonexistent

Says "ls: cannot access '/nonexistent'." This is stderr.

ls /nonexistent > out.txt

You still see the error in your terminal! That's because > only redirects stdout. The file is empty.

To redirect stderr:

ls /nonexistent 2> err.txt

2> is "redirect file descriptor 2 (stderr)." 1> is stdout (same as >).

To redirect both to the same place:

ls /nonexistent > out.txt 2>&1
# or, more readable:
ls /nonexistent &> out.txt

2>&1 is "send stderr to where stdout is going." &> is shorthand for both.

To discard one or both:

command 2>/dev/null            # discard errors only
command > /dev/null 2>&1       # discard both
command &>/dev/null            # same, shorter

/dev/null is the "nothing" file - anything written there is discarded.

Useful text-processing pipes¶

A small zoo you'll see constantly:

sort                # sort alphabetically
sort -n             # sort numerically
sort -r             # reverse
sort -u             # sort + unique
uniq                # collapse adjacent duplicates (often paired with sort)
uniq -c             # count occurrences
cut -d, -f2         # field 2, comma-separated
cut -c1-10          # characters 1-10
awk '{print $1}'    # print first whitespace-separated field
sed 's/foo/bar/g'   # substitute foo with bar (everywhere)
tr 'a-z' 'A-Z'      # translate (here, lowercase to upper)
head / tail         # first/last N lines (page 04)

You don't need to memorize them all. Recognize them when you see them; look up specifics when you have a task.

Exercise¶

Count files in your home:
```
ls ~ | wc -l
```
List the 5 largest directories under your home:
```
du -h ~/* | sort -h | tail -n 5
```
How many lines in your bash history are unique?
```
history | awk '{$1=""; print $0}' | sort -u | wc -l
```
(Strips the history number, sorts unique, counts.)

Save the output of ls -la /etc to a file:

ls -la /etc > etc-listing.txt
wc -l etc-listing.txt

Append the date to a log file:

echo "Started: $(date)" >> mylog.txt
echo "Did stuff" >> mylog.txt
echo "Ended: $(date)" >> mylog.txt
cat mylog.txt

Discard errors from a find of /:

find / -name "*.log" 2>/dev/null | head

Bonus: print the top 5 most-used commands from your history:

history | awk '{print $2}' | sort | uniq -c | sort -rn | head -n 5

What you might wonder¶

"How do I read stdin in a script?" With read (one line at a time) or by reading the whole thing: input=$(cat). Useful when writing filter scripts.

"Why use awk if I have cut?" cut is simpler for fixed delimiters. awk is a small programming language - handles complex columns, conditional output, math. Learn awk '{print $N}' for "print the Nth field"; that covers 80% of awk use.

"What about variables?" Coming in page 09 (shell scripting).

"What's sed for?" sed (stream editor) is for find-and-replace in streams: sed 's/foo/bar/g' file.txt replaces every foo with bar. Powerful for log processing, file editing in scripts.

Done¶

Pipe with |.
Redirect stdout with > (overwrite) and >> (append).
Redirect stderr with 2>, both with &>.
Discard output with /dev/null.
Use tee to split output.
Recognize the standard text-processing pipes.

Next: Processes →