Certain ASCII characters like tab \t, carriage return \r, newline \n, etc have escape sequences to represent them. Additionally, any character can be represented using their ASCII value in octal \NNNor hexadecimal \xNN formats. Unlike character set escape sequences like \w, these can be used inside character classes.
$ # using \t to represent tab character $ printf 'foo\tbar\tbaz\n' | awk '{gsub(/\t/, " ")} 1' foo bar baz $ # these escape sequence work inside character class too $ printf 'a\t\r\fb\vc\n' | awk '{gsub(/[\t\v\f\r]+/, ":")} 1' a:b:c $ # representing single quotes $ # use \047 for octal format $ echo "universe: '42'" | awk '{gsub(/\x27/, "")} 1' universe: 42
If a metacharacter is specified by ASCII value, it will still act as the metacharacter. Undefined sequences will result in a warning and treated as the character it escapes.
$ # \x5e is ^ character, acts as string anchor here $ printf 'cute\ncot\ncat\ncoat\n' | awk '/\x5eco/' cot coat $ # & metacharacter in replacement will be discussed in a later section $ # it represents entire matched portion $ echo 'hello world' | awk '{sub(/.*/, "[&]")} 1' [hello world] $ # \x26 is & character $ echo 'hello world' | awk '{sub(/.*/, "[\x26]")} 1' [hello world] $ echo 'read' | awk '{sub(/a/, "\.")} 1' awk: cmd. line:1: warning: escape sequence `\.' treated as plain `.' re.d
See gawk manual: Escape Sequences for full list and other details.
The third substitution function is gensub which can be used instead of both sub and gsub functions. Syntax wise, gensub needs minimum three arguments. The third argument is used to indicate whether you want to replace all occurrences with "g" or specific occurrence by giving a number. Another difference is that gensub returns a string value (irrespective of substitution succeeding) instead of modifying the input.
$ # same as: sed 's/:/-/2' $ # replace only second occurrence of ':' with '-' $ # note that output of gensub is passed to print here $ echo 'foo:123:bar:baz' | awk '{print gensub(/:/, "-", 2)}' foo:123-bar:baz $ # same as: sed -E 's/[^:]+/X/3' $ # replace only third field with 'X' $ echo 'foo:123:bar:baz' | awk '{print gensub(/[^:]+/, "X", 3)}' foo:123:X:baz
The fourth argument for gensub function allows you to specify the input string or variable on which the substitution has to be performed. Default is $0, as seen in previous examples.
$ # replace vowels with 'X' only for fourth field $ # same as: awk '{gsub(/[aeiou]/, "X", $4)} 1' $ echo '1 good 2 apples' | awk '{$4 = gensub(/[aeiou]/, "X", "g", $4)} 1' 1 good 2 XpplXs