uniq command in Linux

The uniq command in Linux is used to find and remove repeated lines in a file. It helps to keep only the unique lines, making it great for cleaning up data. uniq reads a file (or input from another command), compares each line to the one above it, and removes any duplicates that come right after.

uniq command tutorial for DevOps Engineer’s : Efficient Data Processing

So the uniq command in Linux is one of the most essential tools for text processing and data manipulation. Whether you’re managing system logs, cleaning up data files or analyzing server metrics - uniq can streamline the workflow by removing duplicate lines.

What is the uniq Command in Linux ?

The uniq command removes adjacent duplicate lines in a text file or output from a command. This makes it useful for cleaning up data, identifying unique items in logs, and filtering redundant information.

Basic Syntax of uniq Command

uniq [options] [input_file] [output_file]

input_file : The file from which to remove duplicates.
output_file : (Optional) The file to store the result.

By default, uniq only removes consecutive duplicates. If you want to eliminate duplicates from unsorted data, you will need to use sort before uniq.

Example of uniq Command

1. Remove Consecutive Duplicates

Let’s assume you have a text file (data.txt) with the following content :

apple
banana
banana
orange
orange
orange
grape

Running the uniq command :

uniq data.txt

Output :

apple
banana
orange
grape

🔹 Here, uniq removes the consecutive duplicate lines, leaving only unique values.

2. Remove Adjacent Duplicates

uniq file.txt

🔹 Outputs only unique lines, eliminating consecutive duplicates. Note that it only removes adjacent duplicates, so you may need to sort the file first if duplicates are not next to each other.

3. Count Occurrences of Each Line

uniq -c file.txt

🔹 Adds a count of how many times each line appears.

4. Display Only Duplicates

uniq -d file.txt

🔹 Shows only the lines that are duplicated.

5. Ignore Case Sensitivity

uniq -i file.txt

🔹 Ignores case while removing duplicates (e.g., "Apple" and "apple" are treated as the same).

Advanced uniq Command Options for DevOps Engineers

DevOps engineers often need to filter and process large datasets, such as server logs or system reports. The following options for the uniq command can be incredibly helpful:

Case 1. Count Occurrences of Each Line (-c option)

When analyzing logs, knowing the frequency of each entry is valuable. The -c option counts the number of times each line appears. For example, let's analyze a log file (/var/log/nginx/access.log).

192.168.0.1 - - [10/Oct/2023:10:00:01 +0000] "GET /index.html HTTP/1.1" 200 512
192.168.0.2 - - [10/Oct/2023:10:01:01 +0000] "GET /login HTTP/1.1" 404 128
192.168.0.1 - - [10/Oct/2023:10:05:01 +0000] "GET /index.html HTTP/1.1" 200 512

Use the -c option to count occurrences of each IP address:

awk '{print $1}' /var/log/nginx/access.log | uniq -c

Output:

      2 192.168.0.1
      1 192.168.0.2

This tells us that 192.168.0.1 accessed the server twice, while 192.168.0.2 accessed it once.

Case 2. Display Only Duplicates (-d option)

You might only want to see lines that have duplicates. The -d option is useful for this purpose. Let’s filter the same access log to find only the repeated IP addresses:

awk '{print $1}' /var/log/nginx/access.log | uniq -d

Output:

192.168.0.1

This shows that 192.168.0.1 has duplicate log entries in the file.

Case 3. Show Only Unique Lines (-u option)

In contrast to showing duplicates, you can use the -u option to display lines that only appear once. This is helpful when you want to see unique access attempts or configurations.

awk '{print $1}' /var/log/nginx/access.log | uniq -u

Output:

192.168.0.2

This tells us that 192.168.0.2 accessed the server only once.

Real-World DevOps Examples of uniq Command

Let’s see some DevOps-oriented examples where uniq is used for filtering logs, processing metrics, and analyzing system data.

1. Finding Unique Error Messages in Server Logs

When debugging an issue, DevOps engineers often need to isolate unique error messages. Here’s a sample log snippet from a server error log (/var/log/nginx/error.log):

[error] 1024#0: *1 "POST /login" failed (404: Not Found) while reading response header from upstream
[error] 1025#1: *2 "GET /home" failed (500: Internal Server Error) while reading response header from upstream
[error] 1024#0: *1 "POST /login" failed (404: Not Found) while reading response header from upstream

To extract unique error messages:

grep "failed" /var/log/nginx/error.log | uniq

Output:

[error] 1024#0: *1 "POST /login" failed (404: Not Found) while reading response header from upstream
[error] 1025#1: *2 "GET /home" failed (500: Internal Server Error) while reading response header from upstream

This removes the duplicate error entries, allowing you to focus on distinct error types.

2. Analyzing User Access to a Web Application

When analyzing user access to a web application, you may want to see how many unique users are accessing the site, especially in a multi-user environment. For example, use awk to extract the user-agent or IP addresses and filter out duplicates:

awk '{print $1}' /var/log/nginx/access.log | uniq -c

Output:

      45 192.168.1.1
      32 192.168.1.2

This command counts how many times each IP address has accessed the server, helping you identify traffic patterns.

3. Cleaning Up Server Metrics

For system monitoring or performance reports, you might receive raw data that includes duplicate entries, such as disk usage statistics from df -h:

df -h | uniq

Example Input :

Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1       100G  60G  40G  60% /
/dev/sdb1       500G  200G 300G  40% /mnt/data
/dev/sda1       100G  60G  40G  60% /

Output:

Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1       100G  60G  40G  60% /
/dev/sdb1       500G  200G 300G  40% /mnt/data

This removes duplicate disk entries, providing a cleaner and more readable report.

uniq Important Points :

Log Analysis : Use uniq to filter unique error messages, user sessions, and IP addresses.
System Monitoring : Clean up disk usage reports and performance metrics to focus on key data.
Automation : Integrate uniq into automation scripts for data cleaning and preprocessing.

Back

Check out these tutorials :

sed Command in Linux

awk Command in Linux

sort Command in Linux

grep Command in Linux

cut Command in Linux

wc Command in Linux