Awk Basics
Use awk to split fields, apply per-record logic with BEGIN/END blocks, use NR/NF, accumulate sums, and build small text programs.
- Print specific fields with $1, $2, $NF
- Set FS and OFS to handle delimited data
- Write BEGIN and END blocks for setup and summary logic
- Filter records with pattern conditions
- Accumulate sums and counts across records
grep filters lines. sed transforms them. awk is the tool when you need to compute — extract structured columns from delimited data, sum a column of numbers, reformat a CSV, or print a report. An awk program is a series of pattern { action } rules; for each input record (line), every matching pattern's action runs.
The basic model: fields and records
By default, awk splits each line on whitespace into numbered fields: $1 is the first field, $2 the second, and so on. $0 is the entire line. $NF is the last field regardless of how many there are:
echo "Alice 30 engineer" | awk '{ print $1, $3 }'
# Alice engineer
ls -la | awk '{ print $NF }' # print filename (last column)
ls -la | awk '{ print $5, $NF }' # print size and filenameNR is the current record (line) number; NF is the number of fields on the current line.
FS and OFS: custom delimiters
Set FS (Field Separator) to split on something other than whitespace. OFS (Output Field Separator) controls how fields are joined when you reassign $0 or use print with commas:
# Parse /etc/passwd (colon-delimited)
awk -F: '{ print $1, $7 }' /etc/passwd # username and shell
awk 'BEGIN { FS=":" } { print $1 }' /etc/passwd
# CSV with comma separator
awk -F, '{ print $2 }' data.csv # second column
# Reformat: change delimiter from comma to pipe
awk -F, 'BEGIN { OFS="|" } { $1=$1; print }' data.csvThe $1=$1 trick (assigning a field to itself) forces awk to rebuild $0 using OFS, which is how you change the output delimiter.
Pattern conditions
A pattern before the action block filters which records it runs on:
# Lines where field 3 is greater than 100
awk '$3 > 100 { print $1, $3 }' data.txt
# Lines matching a regex
awk '/ERROR/ { print }' app.log
# A range of lines (start to end pattern, inclusive)
awk '/BEGIN_SECTION/,/END_SECTION/ { print }' report.txt
# Combining conditions
awk '$1 == "Alice" && $3 > 50 { print }' data.txtPatterns without an action block default to { print }, so awk '/ERROR/' is equivalent to grep ERROR.
BEGIN and END blocks
BEGIN runs once before any input is read. END runs once after all input has been processed. Use them for setup and summaries:
awk 'BEGIN { print "Username\tShell" } \
{ print $1, "\t", $7 } \
END { print "Total:", NR, "users" }' \
/etc/passwd
awk -F: 'END { print NR, "lines in /etc/passwd" }' /etc/passwdAccumulating sums
The most common awk idiom: accumulate values across all records, print the total in END:
# Sum the 5th column (file sizes from ls -la)
ls -la | awk 'NR > 1 { sum += $5 } END { print "Total bytes:", sum }'
# Count occurrences of each value in column 1
awk '{ count[$1]++ } END { for (k in count) print k, count[k] }' data.txt
# Average of column 2
awk '{ sum += $2; n++ } END { if (n > 0) print "Average:", sum/n }' data.txtawk arrays are associative (like hash maps). The count[$1]++ idiom creates and increments a counter keyed on the first field without any initialisation needed. This pattern — grouping and counting by key — covers a large fraction of practical log-analysis tasks.
Field separator precedence: -F, on the command line sets FS for all records. Setting FS inside a BEGIN block has the same effect. Setting it in a plain action block (not BEGIN) takes effect only from the next record, not the current one — a common gotcha.
Check your understanding
- 1.In an awk program, $NF refers to which field?
- 2.You want to print a header line before awk processes any input. Where should you put the print statement?
- 3.awk associative arrays must be declared with a type before use.
Do it yourself
# Parse /etc/passwd: print username and shell, count total
awk -F: 'BEGIN { print "User\tShell" }
{ print $1 "\t" $7 }
END { print "---\nTotal:", NR }' /etc/passwd | head -10
# Sum file sizes in /usr/bin (column 5 of ls -la)
ls -la /usr/bin | awk 'NR > 1 && $5+0 > 0 { sum += $5 } END { print "Total:", sum, "bytes" }'
# Count lines per unique first word
echo -e "apple 1\nbanana 2\napple 3\ncherry 1" | \
awk '{ count[$1]++ } END { for (k in count) print k, count[k] }'Where to go next
You can now extract, filter, and aggregate structured text with awk. The final lesson in this module — cut, sort, and uniq — covers three focused tools that compose naturally into compact pipelines for counting, ranking, and deduplicating data.