awk Command Cheatsheet
The awk command is a powerful scripting language for text processing. It is an ideal tool for pattern scanning and data processing, especially when data is organized into columns (like CSV files).
Basic Syntax
awk operates on a loop principle, where for each input line, it checks if it meets a specific **pattern (PATTERN)** and then executes the assigned **action (ACTION)**.
awk 'BEGIN { ACTION } PATTERN { ACTION } END { ACTION }' FILEBEGIN: A block executed once, before the file is processed (e.g., to initialize variables).PATTERN: A logical condition or a regular expression (regex) that must be met by the line.ACTION: An action executed for each line that meets the `PATTERN`. By default, this is `print $0`.END: A block executed once, after the entire file has been processed (e.g., for summaries).
The most basic and common usage of awk is:
awk 'PATTERN { ACTION }' FILEVariables and Data Types
awk automatically divides each line into fields (the default fields separator is a space character), which can be referenced using special variables. It treats values as **strings** by default but automatically converts them to **numbers** when used in an arithmetic context. If a field cannot be parsed as a number, its value is treated as 0 in a numerical operation.
| Variable | Description |
|---|---|
$0 | The entire line. |
$1, $2, ... | The first, second, etc., field (column) of the line. |
NF | The number of fields (columns) in the current line. |
NR | The number of the current line (record number). |
FS | Field separator. The default is whitespace, but it can be changed. For example, use FS="," for CSV files. |
Patterns and Conditions
A pattern can be a regular expression, a logical condition, or a combination of both. You can combine multiple conditions using logical operators like && (AND) and || (OR).
# Regex pattern: display lines containing "ERROR"
# { print } is a shorthand for { print $0 }
awk '/ERROR/ { print }' log.txt
# Logical condition: display lines where the third column is a number greater than 500
awk '$3 > 500 { print $1, $2 }' file.txt
# Combined conditions: display lines that contain "John" in the first field AND "Doe" in the second field
awk '$1 == "John" && $2 == "Doe" { print }' file.txt
# Combined conditions: display lines that contain "ERROR" OR "WARNING"
awk '/ERROR/ || /WARNING/ { print }' log.txt
# Multi-pattern script: Replace a line if a pattern is matched, otherwise print the original line
awk '/pattern_to_match/ { print "This is the new line" } !/pattern_to_match/ { print }' file.txtBuilt-in Functions
awk provides a rich set of built-in functions for string manipulation and mathematical operations.
| Function | Description |
|---|---|
sub(r, s, t) | Substitutes the first occurrence of regular expression `r` with string `s` in string `t`. |
gsub(r, s, t) | Globally substitutes all occurrences of regular expression `r` with string `s` in string `t`. |
length(s) | Returns the length of string `s`. |
split(s, a, fs) | Splits string `s` into an array `a` using field separator `fs`. |
sqrt(x) | Returns the square root of `x`. |
log(x) | Returns the natural logarithm of `x`. |
Actions: Loops and Statements
You can perform multiple actions inside the action block by separating them with a semicolon. This includes loops and conditional statements.
# If-Else statement: classify a value in the third column
awk '{
if ($3 >= 90) {
print $1, "A"
} else if ($3 >= 80) {
print $1, "B"
} else {
print $1, "C"
}
}' grades.txt
# For loop: iterate through all fields in a line
awk '{
for (i = 1; i <= NF; i++) {
print "Field", i, "is:", $i
}
}' file.txt
# While loop: alternative way to iterate through fields
awk '{
i = 1
while (i <= NF) {
print "Field", i, "is:", $i
i++
}
}' file.txtSaving Results
By default, awk prints results to standard output, so you can redirect them to a file.
# Process a CSV file and save the result to a new file
# At the beginning, e.g. before entire file is processed, we set FS (fields separator) to ","
awk 'BEGIN { FS="," } NR > 1 { print $1, $3 }' data.csv > output_data.txt