When it comes to command line text processing, from an abstract point of view, there are three major pillars

Download 125.91 Kb.

Page	52/60
Date	09.03.2023
Size	125.91 Kb.
	#60849

1 ... 48 49 50 51 52 53 54 55 ... 60

Learn GNU AWK

Exercises
Dealing with duplicates

Summary

This chapter discussed a few cases where you need to compare contents of two files. The NR==FNR trick is handy for such cases. The getline function is helpful for line number based comparisons.
Next chapter will discuss how to handle duplicate contents.

Exercises

a) Use contents of match_words.txt file to display matching lines from jumbled.txt and sample.txt. The matching criteria is that the second word of lines from these files should match the third word of lines from match_words.txt.
$ cat match_words.txt %whole(Hello)--{doubt}==ado== just,\joint*,concession<=nice $ # 'concession' is one of the third words from 'match_words.txt' $ # and second word from 'jumbled.txt' $ awk ##### add your solution here wavering:concession/woof\retailer No doubt you like it too
b) Interleave contents of secrets.txt with the contents of a file passed via -v option as shown below.
$ awk -v f='table.txt' ##### add your solution here stag area row tick brown bread mat hair 42 --- deaf chi rate tall glad blue cake mug shirt -7 --- Bi tac toe - 42 yellow banana window shoes 3.14 ---
c) The file search_terms.txt contains one search string per line (these have no regexp metacharacters). Construct an awk command that reads this file and displays search terms (matched case insensitively) that were found in all of the other file arguments. Note that these terms should be matched with any part of the line, not just whole words.
$ cat search_terms.txt hello row you is at $ awk ##### add your solution here ##file list## search_terms.txt jumbled.txt mixed_fs.txt secrets.txt table.txt at row $ awk ##### add your solution here ##file list## search_terms.txt addr.txt sample.txt is you hello

Dealing with duplicates

Often, you need to eliminate duplicates from an input file. This could be based on entire line content or based on certain fields. These are typically solved with sort and uniq commands. Advantage with awk include regexp based field and record separators, input doesn't have to be sorted, and in general more flexibility because it is a programming language.

Download 125.91 Kb.

Share with your friends:

1 ... 48 49 50 51 52 53 54 55 ... 60