When it comes to command line text processing, from an abstract point of view, there are three major pillars



Download 125.91 Kb.
Page50/60
Date09.03.2023
Size125.91 Kb.
#60849
1   ...   46   47   48   49   50   51   52   53   ...   60
Learn GNU AWK

Two file processing


This chapter focuses on solving problems which depend upon contents of two files. These are usually based on comparing records and fields. Sometimes, record number plays a role too. You'll also learn about the getline built-in function.

Comparing records


Consider the following input files which will be compared line wise to get common lines and unique lines.
$ cat color_list1.txt teal light blue green yellow $ cat color_list2.txt light blue black dark green yellow
The key features used in the solution below:

  • For two files as input, NR==FNR will be true only when the first file is being processed

  • next will skip rest of code and fetch next record

  • a[$0] by itself is a valid statement. It will create an uninitialized element in array a with $0 as the key (assuming the key doesn't exist yet)

  • $0 in a checks if the given string ($0 here) exists as a key in array a

$ # common lines $ # same as: grep -Fxf color_list1.txt color_list2.txt $ awk 'NR==FNR{a[$0]; next} $0 in a' color_list1.txt color_list2.txt light blue yellow $ # lines from color_list2.txt not present in color_list1.txt $ # same as: grep -vFxf color_list1.txt color_list2.txt $ awk 'NR==FNR{a[$0]; next} !($0 in a)' color_list1.txt color_list2.txt black dark green $ # reversing the order of input files gives $ # lines from color_list1.txt not present in color_list2.txt $ awk 'NR==FNR{a[$0]; next} !($0 in a)' color_list2.txt color_list1.txt teal green
Note that the NR==FNR logic will fail if the first file is empty. See this unix.stackexchange thread for workarounds.

Comparing fields


In the previous section, you saw how to compare whole contents of records between two files. This section will focus on comparing only specific field(s). The below sample file will be one of the two file inputs for examples in this section.
$ cat marks.txt Dept Name Marks ECE Raj 53 ECE Joel 72 EEE Moi 68 CSE Surya 81 EEE Tia 59 ECE Om 92 CSE Amy 67
To start with, here's a single field comparison. The problem statement is to fetch all records from marks.txt if the first field matches any of the departments listed in dept.txt file.
$ cat dept.txt CSE ECE $ # note that dept.txt is used to build the array keys first $ awk 'NR==FNR{a[$1]; next} $1 in a' dept.txt marks.txt ECE Raj 53 ECE Joel 72 CSE Surya 81 ECE Om 92 CSE Amy 67 $ # if header is needed as well $ awk 'NR==FNR{a[$1]; next} FNR==1 || $1 in a' dept.txt marks.txt Dept Name Marks ECE Raj 53 ECE Joel 72 CSE Surya 81 ECE Om 92 CSE Amy 67
For multiple field comparison, you need to construct the key robustly. Simply concatenating field values can lead to false matches. For example, field values abc and 123 will wrongly match ab and c123. To avoid this, you may introduce some string between the field values, say "_" (if you know the field themselves cannot have this character) or FS (safer option). You could also allow awk to bail you out. If you use , symbol (not "," as a string) between field values, the value of special variable SUBSEP is inserted. SUBSEP has a default value of the non-printing character \034 which is usually not used as part of text files.
$ cat dept_name.txt EEE Moi CSE Amy ECE Raj $ # uses SUBSEP as separator between field values to construct the key $ # note the use of parentheses for key testing $ awk 'NR==FNR{a[$1,$2]; next} ($1,$2) in a' dept_name.txt marks.txt ECE Raj 53 EEE Moi 68 CSE Amy 67
In this example, one of the field is used for numerical comparison.
$ cat dept_mark.txt ECE 70 EEE 65 CSE 80 $ # match Dept and minimum marks specified in dept_mark.txt $ awk 'NR==FNR{d[$1]=$2; next} $1 in d && $3 >= d[$1]' dept_mark.txt marks.txt ECE Joel 72 EEE Moi 68 CSE Surya 81 ECE Om 92
Here's an example of adding a new field.
$ cat role.txt Raj class_rep Amy sports_rep Tia placement_rep $ awk -v OFS='\t' 'NR==FNR{r[$1]=$2; next} {$(NF+1) = FNR==1 ? "Role" : r[$2]} 1' role.txt marks.txt Dept Name Marks Role ECE Raj 53 class_rep ECE Joel 72 EEE Moi 68 CSE Surya 81 EEE Tia 59 placement_rep ECE Om 92 CSE Amy 67 sports_rep

Download 125.91 Kb.

Share with your friends:
1   ...   46   47   48   49   50   51   52   53   ...   60




The database is protected by copyright ©ininet.org 2024
send message

    Main page