When it comes to command line text processing, from an abstract point of view, there are three major pillars



Download 125.91 Kb.
Page39/60
Date09.03.2023
Size125.91 Kb.
#60849
1   ...   35   36   37   38   39   40   41   42   ...   60
Learn GNU AWK

patsplit


The patsplit function will give you the features provided by FPAT. The argument order and optional arguments is same as the split function, with FPAT as the default separator. The return value is number of fields obtained from the split.
$ s='eagle,"fox,42",bee,frog' $ echo "$s" | awk '{patsplit($0, a, /"[^"]*"|[^,]*/); print a[2]}' "fox,42"

substr


The substr function allows to extract specified number of characters from given string based on indexing. The argument order is:

  • First argument is the input string

  • Second argument is starting position

  • Third argument is number of characters to extract

The index starts from 1. If the third argument is not specified, by default all characters until the end of string input is extracted. If the second argument is greater than length of the string or if third argument is less than or equal to 0 then empty string is returned. Second argument will use 1 if a number less than one is specified.
$ echo 'abcdefghij' | awk '{print substr($0, 1, 5)}' abcde $ echo 'abcdefghij' | awk '{print substr($0, 4, 3)}' def $ echo 'abcdefghij' | awk '{print substr($0, 6)}' fghij $ echo 'abcdefghij' | awk -v OFS=: '{print substr($0, 2, 3), substr($0, 6, 3)}' bcd:fgh
If only a few characters are needed from input record, can also use empty FS.
$ echo 'abcdefghij' | awk -v FS= '{print $3}' c $ echo 'abcdefghij' | awk -v FS= '{print $3, $5}' c e

match


The match function is useful to extract portion of an input string matched by a regexp. There are two ways to get the matched portion:

  • by using substr function along with special variables RSTART and RLENGTH

  • by passing a third argument to match so that the results are available from an array

The first argument to match is the input string and second is the regexp. If the match fails, then RSTART gets 0 and RLENGTH gets -1. Return value is same as RSTART.
$ s='051 035 154 12 26 98234' $ # using substr and RSTART/RLENGTH $ echo "$s" | awk 'match($0, /[0-9]{4,}/){print substr($0, RSTART, RLENGTH)}' 98234 $ # using array, note that index 0 is used here, not 1 $ echo "$s" | awk 'match($0, /0*[1-9][0-9]{2,}/, m){print m[0]}' 154
Both the above examples can also be easily solved using FPAT or patsplit. match has an advantage when it comes to getting portions matched only within capture groups. The first element of array will still have the entire match. Second element will contain portion matched by first group, third element will contain portion matched by second group and so on. See also stackoverflow: arithmetic replacement in a text file.
$ # entire matched portion $ echo 'foo=42, baz=314' | awk 'match($0, /baz=([0-9]+)/, m){print m[0]}' baz=314 $ # matched portion of first capture group $ echo 'foo=42, baz=314' | awk 'match($0, /baz=([0-9]+)/, m){print m[1]}' 314
If you need to get matching portions for all the matches instead of just the first match, you can use a loop and adjust the input string every iteration.
$ # extract numbers only if it is followed by a comma $ s='42 foo-5, baz3; x-83, y-20: f12' $ echo "$s" | awk '{ while( match($0, /([0-9]+),/, m) ){print m[1]; $0=substr($0, RSTART+RLENGTH)} }' 5 83

Download 125.91 Kb.

Share with your friends:
1   ...   35   36   37   38   39   40   41   42   ...   60




The database is protected by copyright ©ininet.org 2024
send message

    Main page