When it comes to command line text processing, from an abstract point of view, there are three major pillars



Download 125.91 Kb.
Page13/60
Date09.03.2023
Size125.91 Kb.
#60849
1   ...   9   10   11   12   13   14   15   16   ...   60
Learn GNU AWK

Using string literal as regexp


The first argument to sub and gsub functions can be a string as well, awk will handle converting it to a regexp. This has a few advantages. For example, if you have many / characters in the search pattern, it might become easier to use string instead of regexp.
$ p='/home/learnbyexample/reports' $ echo "$p" | awk '{sub(/\/home\/learnbyexample\//, "~/")} 1' ~/reports $ echo "$p" | awk '{sub("/home/learnbyexample/", "~/")} 1' ~/reports $ # example with line matching instead of substitution $ printf '/foo/bar/1\n/foo/baz/1\n' | awk '/\/foo\/bar\//' /foo/bar/1 $ printf '/foo/bar/1\n/foo/baz/1\n' | awk '$0 ~ "/foo/bar/"' /foo/bar/1
In the above examples, the string literal was supplied directly. But any other expression or variable can be used as well, examples for which will be shown later in this chapter. The reason why string isn't always used as the first argument is that the special meaning for \ character will clash. For example:
$ awk 'gsub("\
", "X")' word_anchors.txt awk: cmd. line:1: warning: escape sequence `\<' treated as plain `<' awk: cmd. line:1: warning: escape sequence `\>' treated as plain `>' $ # you'll need \\ to represent \ $ awk 'gsub("\\
", "X")' word_anchors.txt sub X $ # much more readable with regexp literal $ awk 'gsub(/\
/, "X")' word_anchors.txt sub X $ # another example $ echo '\learn\by\example' | awk '{gsub("\\\\", "/")} 1' /learn/by/example $ echo '\learn\by\example' | awk '{gsub(/\\/, "/")} 1' /learn/by/example
See gawk manual: Gory details for more information than you'd want.

The dot meta character


The dot metacharacter serves as a placeholder to match any character (including the newline character). Later you'll learn how to define your own custom placeholder for limited set of characters.
$ # 3 character sequence starting with 'c' and ending with 't' $ echo 'tac tin cot abc:tyz excited' | awk '{gsub(/c.t/, "-")} 1' ta-in - ab-yz ex-ed $ # any character followed by 3 and again any character $ printf '4\t35x\n' | awk '{gsub(/.3./, "")} 1' 4x $ # 'c' followed by any character followed by 'x' $ awk 'BEGIN{s="abc\nxyz"; sub(/c.x/, " ", s); print s}' ab yz

Download 125.91 Kb.

Share with your friends:
1   ...   9   10   11   12   13   14   15   16   ...   60




The database is protected by copyright ©ininet.org 2024
send message

    Main page