Code of the Day
IntermediateText processing

Lab: Text processing

Hands-on quiz challenges covering regex flavours, sed address ranges, awk field splitting, and pipeline composition.

Lab · optionalBashIntermediate15 min
Recommended first
By the end of this lesson you will be able to:
  • Identify correct grep regex flavours and flags
  • Reason about sed address ranges and in-place editing portability
  • Predict awk field-splitting and NR/NF behaviour
  • Compose multi-step pipelines for common analysis tasks

This lab consolidates the Text processing module. Work through the questions, then practise the pipeline challenges in a real terminal.

grep and regular expressions

  1. 1.
    Which grep command finds lines that contain exactly one digit followed by a letter, using ERE?
  2. 2.
    grep -v "ERROR" app.log prints only the lines that do NOT contain "ERROR".
  3. 3.
    You run grep -r "secret" . in a large repo. It is slow and prints many binary file matches. What is the best fix?

sed address ranges and in-place editing

  1. 1.
    The sed command sed "/START/,/END/d" file.txt does what?
  2. 2.
    Which sed -i invocation works on BOTH GNU/Linux and macOS?
  3. 3.
    sed -n "10,20p" file.txt prints lines 10 through 20 and also prints all other lines.

awk field splitting and logic

  1. 1.
    awk -F, "{print $2}" data.csv — if a field value is "John, Jr.", how many fields does awk see on that line?
  2. 2.
    What does awk "END { print NR }" file.txt print?

Pipeline composition

  1. 1.
    You want to find the 3 most common values in the first column of a space-delimited log file. Which pipeline is correct?
  2. 2.
    sort -u is equivalent to sort | uniq in all cases.

Do it yourself

Work through these pipeline challenges in your terminal:

# Challenge 1: top 5 shells used in /etc/passwd
cut -d: -f7 /etc/passwd | sort | uniq -c | sort -rn | head -5

# Challenge 2: find all unique words starting with a capital letter in /etc/hosts
grep -oE '[A-Z][a-zA-Z]+' /etc/hosts | sort -u

# Challenge 3: list the 3 largest directories under /usr (by number of files)
find /usr -maxdepth 2 -type d 2>/dev/null | \
  while IFS= read -r d; do
    count=$(find "$d" -maxdepth 1 -type f 2>/dev/null | wc -l)
    echo "$count $d"
  done | sort -rn | head -3

# Challenge 4: extract the port number from each line of /etc/services (field 2, then take digits before /)
head -20 /etc/services | grep -v "^#" | awk '{print $2}' | cut -d/ -f1 | sort -nu

Where to go next

You've mastered the core Unix text-processing toolkit. The Advanced tier is next: arrays, parameter expansion, heredocs, traps, and automation topics like cron, Makefiles, CI scripts, and debugging techniques.

Finished reading? Mark it complete to track your progress.

On this page