Saltar a contenido

07 - Pipes and Redirection

What this session is

About 45 minutes. You'll learn pipes (|) and redirection (>, >>, <) - the most powerful feature of the Unix shell. Once you internalize them, you can compose small commands into solutions to problems no individual command would solve.

The Unix philosophy

Write programs that do one thing well. Write programs to work together. Write programs to handle text streams.

Every command has three default channels: - stdin (standard input) - where the command reads from. - stdout (standard output) - where it writes results. - stderr (standard error) - where it writes errors.

By default, stdin is your keyboard; stdout and stderr are your terminal. Pipes and redirection let you wire them differently.

Pipe: |

The pipe sends one command's stdout into another's stdin:

ls | wc -l
  • ls produces a list of files, one per line.
  • wc -l counts lines.
  • Result: the number of files in the current directory.

You can chain as many pipes as you want:

ps aux | grep python | wc -l
  • ps aux - list all running processes.
  • grep python - keep only lines containing "python".
  • wc -l - count them.
  • Result: how many Python processes are running.

Read pipelines left to right. Each | is "and then send through."

Redirecting output: > and >>

Send stdout to a file:

ls > files.txt              # write list to file (overwrites existing)
ls >> files.txt             # append to file (creates if needed)
echo "hello" > greeting.txt
date >> log.txt             # append today's date to log.txt

> overwrites - if files.txt existed, it's now replaced. >> appends - keeps existing content, adds to the end.

For commands that produce a lot of output you want to save:

find / -name "*.log" > all-logs.txt 2>/dev/null

2>/dev/null redirects stderr to "nothing" - suppresses error messages from directories you can't read.

Redirecting input: <

Send a file's contents to a command's stdin:

sort < unsorted.txt > sorted.txt

sort reads from stdin (here, unsorted.txt) and writes sorted lines to stdout (here, sorted.txt).

In practice you rarely use < because most commands accept a filename argument too: sort unsorted.txt > sorted.txt works the same. But < is useful when a command only reads stdin.

tee: split output

tee writes to a file AND to stdout, so the pipeline continues:

ls -la | tee files.txt | wc -l
  • ls -la lists files.
  • tee files.txt writes the list to a file AND passes it on.
  • wc -l counts.
  • Result: file count, and files.txt also has the listing.

Useful when you want to save intermediate output without breaking the pipeline.

Combining: real examples

Count how many .py files are in a directory tree:

find . -name "*.py" | wc -l

Find the 5 largest files in your home:

du -ah ~ | sort -h | tail -n 5
  • du -ah ~ - disk usage for each file under home (human-readable).
  • sort -h - sort by human-readable size.
  • tail -n 5 - keep the last 5 (largest).

Find the most-used commands in your shell history:

history | awk '{print $2}' | sort | uniq -c | sort -rn | head -n 10
  • history - your command history.
  • awk '{print $2}' - second word of each line (the command itself).
  • sort - sort alphabetically.
  • uniq -c - collapse duplicates and count.
  • sort -rn - sort numerically, reversed (biggest first).
  • head -n 10 - top 10.

You don't need to understand every piece yet. Notice: a complex task is solved by piping simple tools together. That's the Unix way.

Save all warnings from a log file to a separate file:

grep "WARNING" /var/log/syslog > warnings.txt

Watch log files for errors as they happen:

tail -f /var/log/syslog | grep -i error

stderr vs stdout

Some commands write errors separately. Compare:

ls /nonexistent

Says "ls: cannot access '/nonexistent'." This is stderr.

ls /nonexistent > out.txt

You still see the error in your terminal! That's because > only redirects stdout. The file is empty.

To redirect stderr:

ls /nonexistent 2> err.txt

2> is "redirect file descriptor 2 (stderr)." 1> is stdout (same as >).

To redirect both to the same place:

ls /nonexistent > out.txt 2>&1
# or, more readable:
ls /nonexistent &> out.txt

2>&1 is "send stderr to where stdout is going." &> is shorthand for both.

To discard one or both:

command 2>/dev/null            # discard errors only
command > /dev/null 2>&1       # discard both
command &>/dev/null            # same, shorter

/dev/null is the "nothing" file - anything written there is discarded.

Useful text-processing pipes

A small zoo you'll see constantly:

sort                # sort alphabetically
sort -n             # sort numerically
sort -r             # reverse
sort -u             # sort + unique
uniq                # collapse adjacent duplicates (often paired with sort)
uniq -c             # count occurrences
cut -d, -f2         # field 2, comma-separated
cut -c1-10          # characters 1-10
awk '{print $1}'    # print first whitespace-separated field
sed 's/foo/bar/g'   # substitute foo with bar (everywhere)
tr 'a-z' 'A-Z'      # translate (here, lowercase to upper)
head / tail         # first/last N lines (page 04)

You don't need to memorize them all. Recognize them when you see them; look up specifics when you have a task.

Going deeper

You can build pipelines now. This is the depth that turns "my pipe gave weird results" into a diagnosis - the failure modes that confuse everyone, with what you'll see and why.

The #1 pipe gotcha: where does my output go?

You pipe a command and nothing happens - or error messages still flood your screen despite redirecting. The cause is almost always stdout vs stderr confusion. Every program has two output streams:

  • stdout (file descriptor 1) - normal output. | and > redirect only this.
  • stderr (file descriptor 2) - errors and diagnostics. Pipes and > ignore it by default - it goes straight to your terminal.

Watch the trap:

$ find / -name '*.conf' > results.txt        # you redirect output...
find: '/root': Permission denied               # ...but errors STILL print to screen!
find: '/proc/1/...': Permission denied         # because they're on stderr, not stdout

The errors aren't going into results.txt - they're on stderr, which > didn't touch. To capture or discard each stream:

find / -name '*.conf' > results.txt 2>/dev/null    # stdout to file, errors discarded
find / -name '*.conf' > results.txt 2>errors.txt   # each stream to its own file
find / -name '*.conf' > all.txt 2>&1               # BOTH streams to one file (2>&1 = "stderr to where stdout goes")
find / -name '*.conf' 2>&1 | grep -i denied        # pipe BOTH streams (note: 2>&1 needed to pipe stderr)

The 2>&1 idiom ("send stderr to the same place as stdout") is the one to memorize - it's how you make a pipe or redirect actually capture errors too. "Why are errors still showing / why is my error log empty?" is always a stdout-vs-stderr issue, and 2>&1 is usually the answer. Order matters: > file 2>&1 works (redirect stdout to file, then stderr to stdout's new target); 2>&1 > file does NOT (stderr copies stdout's old target - the terminal - before stdout moves). A classic subtle bug.

Exit codes - the silent half of every command

Every command returns an exit code: 0 = success, non-zero = failure. You don't see it, but it's how scripts make decisions and how pipe failures hide. Check it with $?:

$ grep "needle" haystack.txt
$ echo $?
0                    # found it (success)
$ grep "nothing" haystack.txt
$ echo $?
1                    # not found (grep returns 1 when no match - not an "error", a result)

This matters because grep returning 1 on no-match will fail a script under set -e even though nothing went wrong - a real scripting gotcha. The deeper trap is in pipes: by default, a pipeline's exit code is only the last command's:

$ cat nonexistent.txt | sort | head
cat: nonexistent.txt: No such file or directory
$ echo $?
0                    # 0?! The `cat` FAILED, but `head` succeeded, so the pipeline "succeeded"

The cat failed, but because head (the last command) succeeded, the whole pipeline reports success - a silent failure that hides bugs in scripts. The fix is set -o pipefail (the pipeline fails if any stage fails):

$ set -o pipefail
$ cat nonexistent.txt | sort | head
$ echo $?
1                    # now the failure propagates

set -euo pipefail at the top of a script (the "unofficial bash strict mode") catches this whole class of silent pipe failures - a habit that separates robust scripts from fragile ones.

What you'll see: buffering surprises

You pipe a live-updating command and the output freezes or arrives in big chunks instead of line by line:

$ tail -f log.txt | grep ERROR        # may show NOTHING for a long time, then a burst

The cause is buffering: when a program's stdout is a pipe (not a terminal), it switches from line-buffered to block-buffered (~4-8 KB) for efficiency - so grep doesn't see lines until the buffer fills. On a live log this looks like a hang. The fix is to force line buffering:

$ tail -f log.txt | grep --line-buffered ERROR     # grep flushes per line
$ stdbuf -oL somecommand | grep ...                # force line-buffering on any command

"My piped live output is laggy/chunky" is always buffering, and --line-buffered / stdbuf -oL is the fix. Without knowing this, the pipeline looks broken when it's just buffering.

Process substitution - the trick beyond simple pipes

Sometimes you need to feed a command's output where a filename is expected (not stdin). <(...) makes a command's output look like a file:

$ diff <(sort file1.txt) <(sort file2.txt)     # diff two sorted files without temp files
$ comm -3 <(sort a) <(sort b)                   # compare sets on the fly

<(command) runs the command and hands its output to the outer command as a temporary file path. It's how you compare, join, or combine multiple command outputs when a single | (which only connects one stdout to one stdin) isn't enough. Recognizing <(...) in scripts - and reaching for it when you'd otherwise make temp files - is a mark of fluency.

Try it (with what you'll see)

  1. Run find /etc -name '*.conf' > out.txt and watch errors still hit your screen. Add 2>/dev/null and watch them vanish. Then 2>&1 | grep -i denied to capture them instead. Feel the two streams.
  2. grep "zzz" /etc/hostname; echo $? -> see exit code 1 on no-match. Then false | true; echo $? -> 0 (last command). Add set -o pipefail, repeat -> 1. Watch the silent failure become visible.
  3. tail -f /var/log/syslog | grep something (or any growing file) and notice the chunkiness; add --line-buffered and watch it flow per line.
  4. diff <(ls dir1) <(ls dir2) to compare two directory listings without temp files. See process substitution work.

Exercise

  1. Count files in your home:

    ls ~ | wc -l
    

  2. List the 5 largest directories under your home:

    du -h ~/* | sort -h | tail -n 5
    

  3. How many lines in your bash history are unique?

    history | awk '{$1=""; print $0}' | sort -u | wc -l
    
    (Strips the history number, sorts unique, counts.)

  4. Save the output of ls -la /etc to a file:

    ls -la /etc > etc-listing.txt
    wc -l etc-listing.txt
    

  5. Append the date to a log file:

    echo "Started: $(date)" >> mylog.txt
    echo "Did stuff" >> mylog.txt
    echo "Ended: $(date)" >> mylog.txt
    cat mylog.txt
    

  6. Discard errors from a find of /:

    find / -name "*.log" 2>/dev/null | head
    

  7. Bonus: print the top 5 most-used commands from your history:

    history | awk '{print $2}' | sort | uniq -c | sort -rn | head -n 5
    

What you might wonder

"How do I read stdin in a script?" With read (one line at a time) or by reading the whole thing: input=$(cat). Useful when writing filter scripts.

"Why use awk if I have cut?" cut is simpler for fixed delimiters. awk is a small programming language - handles complex columns, conditional output, math. Learn awk '{print $N}' for "print the Nth field"; that covers 80% of awk use.

"What about variables?" Coming in page 09 (shell scripting).

"What's sed for?" sed (stream editor) is for find-and-replace in streams: sed 's/foo/bar/g' file.txt replaces every foo with bar. Powerful for log processing, file editing in scripts.

Done

  • Pipe with |.
  • Redirect stdout with > (overwrite) and >> (append).
  • Redirect stderr with 2>, both with &>.
  • Discard output with /dev/null.
  • Use tee to split output.
  • Recognize the standard text-processing pipes.

Next: Processes →

Comments