When it comes to command line text processing, from an abstract point of view, there are three major pillars


Behavior of ^ and $ when string contains newline



Download 125.91 Kb.
Page57/60
Date09.03.2023
Size125.91 Kb.
#60849
1   ...   52   53   54   55   56   57   58   59   60
Learn GNU AWK

Behavior of ^ and $ when string contains newline


In some regular expression implementations, ^ matches the start of a line and $ matches the end of a line (with newline as the line separator). In awk, these anchors always match the start of the entire string and end of the entire string respectively. This comes into play when RS is other than the newline character, or if you have a string value containing newline characters.
$ # 'apple\n' doesn't match as there's newline character $ printf 'apple\n,mustard,grape,\nmango' | awk -v RS=, '/e$/' grape $ # '\nmango' doesn't match as there's newline character $ printf 'apple\n,mustard,grape,\nmango' | awk -v RS=, '/^m/' mustard

Word boundary differences


The word boundary \y matches both start and end of word locations. Whereas, \< and \> match exactly the start and end of word locations respectively. This leads to cases where you have to choose which of these word boundaries to use depending on results desired. Consider I have 12, he has 2! as sample text, shown below as an image with vertical bars marking the word boundaries. The last character ! doesn't have end of word boundary as it is not a word character.

$ # \y matches both start and end of word boundaries $ # the first match here used starting boundary of 'I' and 'have' $ echo 'I have 12, he has 2!' | awk '{gsub(/\y..\y/, "[&]")} 1' [I ]have [12][, ][he] has[ 2]! $ # \< and \> only match the start and end word boundaries respectively $ echo 'I have 12, he has 2!' | awk '{gsub(/\<..\>/, "[&]")} 1' I have [12], [he] has 2!
Here's another example to show the difference between the two types of word boundaries.
$ # add something to both start/end of word $ echo 'hi log_42 12b' | awk '{gsub(/\y/, ":")} 1' :hi: :log_42: :12b: $ # add something only at start of word $ echo 'hi log_42 12b' | awk '{gsub(/\/, ":")} 1' hi: log_42: 12b:

Download 125.91 Kb.

Share with your friends:
1   ...   52   53   54   55   56   57   58   59   60




The database is protected by copyright ©ininet.org 2024
send message

    Main page