When it comes to command line text processing, from an abstract point of view, there are three major pillars

Download 125.91 Kb.

Page	38/60
Date	09.03.2023
Size	125.91 Kb.
	#60849

1 ... 34 35 36 37 38 39 40 41 ... 60

Learn GNU AWK

split

The split function provides the same features as the record splitting done using FS. This is helpful when you need the results as an array for some reason, for example to use array sorting features. Or, when you need to further split a field content. split accepts four arguments, the last two being optional.

First argument is the string to be split
Second argument is the array variable to save results
Third argument is the separator, whose default is FS

The return value of split function is number of fields, similar to NF variable. The array gets indexed starting from 1 for first element, 2 for second element and so on. If the array already had some value, it gets overwritten with the new value.
$ # same as: awk '{print $2}' $ printf ' one \t two\t\t\tthree ' | awk '{split($0, a); print a[2]}' two $ # example with both FS and split in action $ s='Joe,1996-10-25,64,78' $ echo "$s" | awk -F, '{split($2, d, "-"); print $1 " was born in " d[1]}' Joe was born in 1996 $ # single row to multiple rows based on splitting last field $ s='air,water,12:42:3' $ echo "$s" | awk -F, '{n=split($NF, a, ":"); for(i=1; i<=n; i++) print $1, $2, a[i]}' air water 12 air water 42 air water 3
Similar to FS, you can use regular expression as a separator.
$ s='Sample123string42with777numbers' $ echo "$s" | awk '{split($0, s, /[0-9]+/); print s[2], s[4]}' string numbers
The fourth argument provides a feature not present with FS splitting. It allows you to save the portions matched by the separator in an array. Quoting from gawk manual: split():

If fieldsep is a single space, then any leading whitespace goes into seps[0] and any trailing whitespace goes into seps[n], where n is the return value of split() (i.e., the number of elements in array).

$ s='Sample123string42with777numbers' $ echo "$s" | awk '{n=split($0, s, /[0-9]+/, seps); for(i=1; iHere's an example where split is merely used to initialize an array based on empty separator. Unlike $N syntax where an expression resulting in floating-point number is acceptable, array index has to be an integer. Hence, int function is used to convert floating-point result to integer in the example below.
$ cat marks.txt Dept Name Marks ECE Raj 53 ECE Joel 72 EEE Moi 68 CSE Surya 81 EEE Tia 59 ECE Om 92 CSE Amy 67 $ # adds a new grade column based on marks in 3rd column $ awk 'BEGIN{OFS="\t"; split("DCBAS", g, //)} {$(NF+1) = NR==1 ? "Grade" : g[int($NF/10)-4]} 1' marks.txt Dept Name Marks Grade ECE Raj 53 D ECE Joel 72 B EEE Moi 68 C CSE Surya 81 A EEE Tia 59 D ECE Om 92 S CSE Amy 67 C

Download 125.91 Kb.

Share with your friends:

1 ... 34 35 36 37 38 39 40 41 ... 60