uniq command in Linux
The uniq command in Linux is used to find and remove repeated lines in a file. It helps to keep only the unique lines, making it great for cleaning up data. uniq reads a file (or input from another command), compares each line to the one above it, and removes any duplicates that come right after.
uniq command tutorial for DevOps Engineer’s : Efficient Data Processing
So the uniq command in Linux is one of the most essential tools for text processing and data manipulation. Whether you’re managing system logs, cleaning up data files or analyzing server metrics - uniq can streamline the workflow by removing duplicate lines.
What is the uniq Command in Linux ?
The uniq command removes adjacent duplicate lines in a text file or output from a command. This makes it useful for cleaning up data, identifying unique items in logs, and filtering redundant information.
Basic Syntax of uniq Command
uniq [options] [input_file] [output_file]
- input_file : The file from which to remove duplicates.
- output_file : (Optional) The file to store the result.
By default, uniq only removes consecutive duplicates. If you want to eliminate duplicates from unsorted data, you will need to use sort before uniq.
Example of uniq Command
1. Remove Consecutive Duplicates
Let’s assume you have a text file (data.txt) with the following content :
apple banana banana orange orange orange grape
Running the uniq command :
uniq data.txt
Output :
apple banana orange grape
🔹 Here, uniq removes the consecutive duplicate lines, leaving only unique values.
2. Remove Adjacent Duplicates
uniq file.txt
🔹 Outputs only unique lines, eliminating consecutive duplicates. Note that it only removes adjacent duplicates, so you may need to sort the file first if duplicates are not next to each other.
3. Count Occurrences of Each Line
uniq -c file.txt
🔹 Adds a count of how many times each line appears.
4. Display Only Duplicates
uniq -d file.txt
🔹 Shows only the lines that are duplicated.
5. Ignore Case Sensitivity
uniq -i file.txt
🔹 Ignores case while removing duplicates (e.g., "Apple" and "apple" are treated as the same).
Advanced uniq Command Options for DevOps Engineers
DevOps engineers often need to filter and process large datasets, such as server logs or system reports. The following options for the uniq command can be incredibly helpful:
Case 1. Count Occurrences of Each Line (-c option)
When analyzing logs, knowing the frequency of each entry is valuable. The -c option counts the number of times each line appears. For example, let's analyze a log file (/var/log/nginx/access.log).
192.168.0.1 - - [10/Oct/2023:10:00:01 +0000] "GET /index.html HTTP/1.1" 200 512 192.168.0.2 - - [10/Oct/2023:10:01:01 +0000] "GET /login HTTP/1.1" 404 128 192.168.0.1 - - [10/Oct/2023:10:05:01 +0000] "GET /index.html HTTP/1.1" 200 512
Use the -c option to count occurrences of each IP address:
awk '{print $1}' /var/log/nginx/access.log | uniq -c
Output:
2 192.168.0.1 1 192.168.0.2
This tells us that 192.168.0.1 accessed the server twice, while 192.168.0.2 accessed it once.
Case 2. Display Only Duplicates (-d option)
You might only want to see lines that have duplicates. The -d option is useful for this purpose. Let’s filter the same access log to find only the repeated IP addresses:
awk '{print $1}' /var/log/nginx/access.log | uniq -d
Output:
192.168.0.1
This shows that 192.168.0.1 has duplicate log entries in the file.
Case 3. Show Only Unique Lines (-u option)
In contrast to showing duplicates, you can use the -u option to display lines that only appear once. This is helpful when you want to see unique access attempts or configurations.
awk '{print $1}' /var/log/nginx/access.log | uniq -u
Output:
192.168.0.2
This tells us that 192.168.0.2 accessed the server only once.
Real-World DevOps Examples of uniq Command
Let’s see some DevOps-oriented examples where uniq is used for filtering logs, processing metrics, and analyzing system data.
1. Finding Unique Error Messages in Server Logs
When debugging an issue, DevOps engineers often need to isolate unique error messages. Here’s a sample log snippet from a server error log (/var/log/nginx/error.log):
[error] 1024#0: *1 "POST /login" failed (404: Not Found) while reading response header from upstream [error] 1025#1: *2 "GET /home" failed (500: Internal Server Error) while reading response header from upstream [error] 1024#0: *1 "POST /login" failed (404: Not Found) while reading response header from upstream
To extract unique error messages:
grep "failed" /var/log/nginx/error.log | uniq
Output:
[error] 1024#0: *1 "POST /login" failed (404: Not Found) while reading response header from upstream [error] 1025#1: *2 "GET /home" failed (500: Internal Server Error) while reading response header from upstream
This removes the duplicate error entries, allowing you to focus on distinct error types.
2. Analyzing User Access to a Web Application
When analyzing user access to a web application, you may want to see how many unique users are accessing the site, especially in a multi-user environment. For example, use awk to extract the user-agent or IP addresses and filter out duplicates:
awk '{print $1}' /var/log/nginx/access.log | uniq -c
Output:
45 192.168.1.1 32 192.168.1.2
This command counts how many times each IP address has accessed the server, helping you identify traffic patterns.
3. Cleaning Up Server Metrics
For system monitoring or performance reports, you might receive raw data that includes duplicate entries, such as disk usage statistics from df -h:
df -h | uniq
Example Input :
Filesystem Size Used Avail Use% Mounted on /dev/sda1 100G 60G 40G 60% / /dev/sdb1 500G 200G 300G 40% /mnt/data /dev/sda1 100G 60G 40G 60% /
Output:
Filesystem Size Used Avail Use% Mounted on /dev/sda1 100G 60G 40G 60% / /dev/sdb1 500G 200G 300G 40% /mnt/data
This removes duplicate disk entries, providing a cleaner and more readable report.
uniq Important Points :
- Log Analysis : Use uniq to filter unique error messages, user sessions, and IP addresses.
- System Monitoring : Clean up disk usage reports and performance metrics to focus on key data.
- Automation : Integrate uniq into automation scripts for data cleaning and preprocessing.