This chapter showed how to work with duplicate contents, both record and field based. If you don't need regexp based separators and if your input is too big to handle, then specialized command line tools sort and uniq will be better suited compared to awk.
Next chapter will show how to write awk scripts instead of the usual one-liners.
Exercises
a) Retain only first copy of a line for the input file lines.txt. Case should be ignored while comparing lines. For example hi there and HI TheRE will be considered as duplicates.
$ cat lines.txt Go There come on go there --- 2 apples and 5 mangoes come on! --- 2 Apples COME ON $ awk ##### add your solution here Go There come on --- 2 apples and 5 mangoes come on! 2 Apples
b) Retain only first copy of a line for the input file lines.txt. Assume space as field separator with two fields on each line. Compare the lines irrespective of order of the fields. For example, hehe haha and haha hehe will be considered as duplicates.
$ cat twos.txt hehe haha door floor haha hehe 6;8 3-4 true blue hehe bebe floor door 3-4 6;8 tru eblue haha hehe $ awk ##### add your solution here hehe haha door floor 6;8 3-4 true blue hehe bebe tru eblue
c) For the input file twos.txt, create a file uniq.txt with all the unique lines and dupl.txt with all the duplicate lines. Assume space as field separator with two fields on each line. Compare the lines irrespective of order of the fields. For example, hehe haha and haha hehe will be considered as duplicates.
$ awk ##### add your solution here $ cat uniq.txt true blue hehe bebe tru eblue $ cat dupl.txt hehe haha door floor haha hehe 6;8 3-4 floor door 3-4 6;8 haha hehe
awk scripts
-f option
The -f command line option allows you to pass the awk code via file instead of writing it all on the command line. Here's a one-liner seen earlier that's been converted to a multiline script. Note that ; is no longer necessary to separate the commands, newline will do that too.
$ cat buf.awk /error/{ f = 1 buf = $0 next } f{ buf = buf ORS $0 } /state/{ if(f) print buf f = 0 } $ awk -f buf.awk broken.txt error 2 1234 6789 state 1 error 4 abcd state 3
Another advantage is that single quotes can be freely used.
$ echo 'cue us on this example' | awk -v q="'" '{gsub(/\w+/, q "&" q)} 1' 'cue' 'us' 'on' 'this' 'example' # the above solution is simpler to write as a script $ cat quotes.awk { gsub(/\w+/, "'&'") } 1 $ echo 'cue us on this example' | awk -f quotes.awk 'cue' 'us' 'on' 'this' 'example'