awk Command in Linux

The awk command is one of the most useful tools in Linux for working with text. It's especially helpful for DevOps beginners who want to filter, extract and format data from files or command outputs. Whether you're reading log files, checking server stats, or writing simple scripts, awk makes it easier to handle and process information in an automated way.

awk tutorial for DevOps : A Powerful Text Processing Tool

When you need to slice and analyze text files or command output on the Linux command line, awk is the go-to tool. awk is a lightweight programming language designed for text processing and data extraction.

What is awk Command ?

awk reads input line by line, splits it into fields, and allows you to perform operations based on patterns and actions. It’s perfect for parsing structured text like CSVs, logs, or config files.

Basic Syntax of awk Command :

awk 'pattern { action }' file

pattern : The condition that must be satisfied for awk to apply the action.
action : The operation or set of operations to be executed if the pattern matches.

If no pattern is specified, awk performs the action for every line in the input.

Every line of input is automatically split into fields, using whitespace as the default separator. Remember below points -

$0 = entire line
$1 = first field
$2 = second field
NF = number of fields
NR = current record number (i.e., line number)
OFS = Output Field Separator

Key Features of the awk Command

1. Pattern Matching

awk can search for patterns in text, similar to how grep works. You can use more advanced search options like regular expressions to find specific words or patterns in each line.

2. Working with Fields

When you run awk, it automatically splits each line into parts (called fields), usually using spaces or tabs.

$1 means the first word (field)
$2 is the second word
$NF means the last word in the line
$0 represents the whole line

This makes it easy to pick out and work with specific parts of a line.

3. Using Conditions

You can use if, else, while, and for in awk to perform actions only when certain conditions are true - just like in regular programming.

4. Built-in Functions

awk comes with handy functions :

Math functions like sqrt() or sum
Text functions like length() to count characters or substr() to extract part of a word
You can even create your own functions for more advanced tasks.

5. BEGIN and END Blocks

BEGIN: Runs before awk starts processing the input
END: Runs after all the input has been processed
These are useful for setting up or printing summaries.

6. Formatting and Reporting

awk can rearrange text, format it neatly, or even generate simple reports. This is very useful when you're cleaning up logs or displaying data in a readable way.

Example of awk Command

1. Print Specific Columns

awk '{ print $1 }' file.txt

🔹 Prints the first column of each line.

awk '{ print $1, $3 }' file.txt

🔹 Prints the first and third columns.

2. Filter by a Condition

awk '$3 > 100' data.txt

🔹 Prints lines where the third field is greater than 100.

awk '$1 == "John"' employees.txt

🔹 Prints lines where the first field is “John”.

3. Use Custom Field Separator

awk -F ":" '{ print $1 }' /etc/passwd

🔹 Uses : as the field separator to print usernames from /etc/passwd.

4. Print Line Numbers

awk '{ print NR, $0 }' file.txt

🔹 Prefixes each line with its line number.

5. Calculate Column Totals

awk '{ sum += $2 } END { print "Total:", sum }' sales.txt

🔹 Sums up the values in column 2 and prints the total.

6. Find the Average

awk '{ total += $2; count++ } END { print total/count }' data.txt

🔹 Computes the average of values in column 2.

Practical use case of awk command

1. View top memory-consuming processes :

ps aux | awk '$4 > 1 { print $1, $4, $11 }'

🔹 Prints user, memory usage, and command for processes using >1% memory.

2. Parsing Log Files to Extract Useful Data

Let’s assume we have an NGINX access log (/var/log/nginx/access.log). The log file records details of every HTTP request made to the web server, such as the IP address, request method, status code, and the number of bytes sent in the response.

Case a : Extracting IP Addresses and Response Status Codes

Here’s what the log might look like:


192.168.1.1 - - [01/Jul/2023:12:30:45 +0000] "GET /index.html HTTP/1.1" 200 1024
192.168.1.2 - - [01/Jul/2023:12:31:01 +0000] "POST /login HTTP/1.1" 404 512
192.168.1.3 - - [01/Jul/2023:12:32:20 +0000] "GET /about HTTP/1.1" 200 2048

Now, let’s say we want to extract IP addresses and the HTTP status codes.

awk '{print $1, $9}' /var/log/nginx/access.log

$1: IP address (first field).
$9: Status code (ninth field).

Output:

192.168.1.1 200 192.168.1.2 404 192.168.1.3 200

This command extracts the IP address (field $1) and status code (field $9) for each log entry.

Case b : Counting 404 Errors

Let’s say we want to count how many 404 errors (Not Found) occurred on your web server. You would use awk to filter for lines where the status code is 404. ( Among most asked Interview Question )

awk '$9 == 404 {count++} END {print count}' /var/log/nginx/access.log

$9 == 404: Only lines where the status code is 404 are processed.
count++: Counts the matching lines.
END {print count}: After processing all lines, print the total count.

Output:

This tells us that 1 request resulted in a 404 error.

3. Analyzing System Metrics

Case a : Extracting Disk Usage Information

We can also use awk to process system commands like df -h (disk usage) to extract useful information, such as which filesystem is mounted and how much space is used.

Here’s an example output from df -h :

Filesystem Size Used Avail Use% Mounted on /dev/sda1 100G 60G 40G 60% / /dev/sdb1 500G 200G 300G 40% /mnt/data

To extract the filesystem name and usage percentage, you can use awk:

df -h | awk 'NR>1 {print $1, $5}'

NR>1: Skips the header line.
$1: Filesystem name.
$5: Usage percentage.

Output:

/dev/sda1 60% /dev/sdb1 40%

This shows that /dev/sda1 is 60% full and /dev/sdb1 is 40% full.

Case b : Extracting Top Processes by CPU Usage

You can also use awk to filter and process the output from the top command. Let’s say you want to view the top 5 processes using the most CPU.

Here’s an example output of the top command:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1234 user1 20 0 100000 12000 5000 S 45.2 0.3 2:34.12 java 5678 user2 20 0 90000 10000 4000 S 32.1 0.2 1:56.01 nginx 9101 user3 20 0 80000 8000 3000 S 25.3 0.1 1:22.33 python

Now, use awk to filter for the PID, CPU usage, and command:

top -bn1 | awk 'NR>7 {print $1, $9, $12}' | head -n 5

NR>7: Skips the first 7 lines (header).
$1: Process ID (PID).
$9: CPU usage percentage.
$12: Command name.

Output:

1234 45.2 java 5678 32.1 nginx 9101 25.3 python

This tells us that the process with PID 1234 is using 45.2% CPU, running the java process.

so , awk is a important tool for anyone working with text files in Linux, enabling efficient data extraction, manipulation, and reporting.

Back

Check out these tutorials :

sed Command in Linux

grep Command in Linux

sort Command in Linux

uniq Command in Linux

cut Command in Linux

wc Command in Linux