When it comes to command line text processing, from an abstract point of view, there are three major pillars



Download 125.91 Kb.
Page44/60
Date09.03.2023
Size125.91 Kb.
#60849
1   ...   40   41   42   43   44   45   46   47   ...   60
Learn GNU AWK

Multiple file input


You have already seen control structures like BEGIN, END and next. This chapter will discuss control structures that are useful to make decisions around each file when there are multiple files passed as input.

BEGINFILE, ENDFILE and FILENAME


  • BEGINFILE — this block gets executed before start of each input file

  • ENDFILE — this block gets executed after processing each input file

  • FILENAME — special variable having file name of current input file

$ awk 'BEGINFILE{print "--- " FILENAME " ---"} 1' greeting.txt table.txt --- greeting.txt --- Hi there Have a nice day Good bye --- table.txt --- brown bread mat hair 42 blue cake mug shirt -7 yellow banana window shoes 3.14 $ # same as: tail -q -n1 greeting.txt table.txt $ awk 'ENDFILE{print $0}' greeting.txt table.txt Good bye yellow banana window shoes 3.14

nextfile


nextfile will skip remaining records from the current file being processed and move on to the next file.
$ # print filename if it contains 'I' anywhere in the file $ # same as: grep -l 'I' f[1-3].txt greeting.txt $ awk '/I/{print FILENAME; nextfile}' f[1-3].txt greeting.txt f1.txt f2.txt $ # print filename if it contains both 'o' and 'at' anywhere in the file $ awk 'BEGINFILE{m1=m2=0} /o/{m1=1} /at/{m2=1} m1 && m2{print FILENAME; nextfile}' f[1-3].txt greeting.txt f2.txt f3.txt $ # print filename if it contains 'at' but not 'o' $ awk 'BEGINFILE{m1=m2=0} /o/{m1=1; nextfile} /at/{m2=1} ENDFILE{if(!m1 && m2) print FILENAME}' f[1-3].txt greeting.txt f1.txt
nextfile cannot be used in BEGIN or END or ENDFILE blocks. See gawk manual: nextfile for more details, how it affects ENDFILE and other special cases.

ARGC and ARGV


The ARGC special variable contains total number of arguments passed to the awk command, including awk itself as an argument. The ARGV special array contains the arguments themselves.
$ # note that index starts with '0' here $ awk 'BEGIN{for(i=0; iSimilar to manipulating NF and modifying $N field contents, you can change the values of ARGC and ARGV to control how the arguments should be processed.
However, not all arguments are necessarily filenames. awk allows assigning variable values without -v option if it is done in the place where you usually provide file arguments. For example:
$ awk 'BEGIN{for(i=0; iIn the above example, the variable n will get a value of 5 after awk has finished processing table.txt file. Here's an example where FS is changed between two files.
$ cat table.txt brown bread mat hair 42 blue cake mug shirt -7 yellow banana window shoes 3.14 $ cat books.csv Harry Potter,Mistborn,To Kill a Mocking Bird Matilda,Castle Hangnail,Jane Eyre $ # for table.txt, FS will be default value $ # for books.csv, FS will be comma character $ # OFS is comma for both files $ awk -v OFS=, 'NF=2' table.txt FS=, books.csv brown,bread blue,cake yellow,banana Harry Potter,Mistborn Matilda,Castle Hangnail
See stackoverflow: extract positions 2-7 from a fasta sequence for a practical example of changing field/record separators between the files being processed.

Download 125.91 Kb.

Share with your friends:
1   ...   40   41   42   43   44   45   46   47   ...   60




The database is protected by copyright ©ininet.org 2024
send message

    Main page