When it comes to command line text processing, from an abstract point of view, there are three major pillars

Download 125.91 Kb.
Size125.91 Kb.
1   ...   5   6   7   8   9   10   11   12   ...   60

Regular Expressions

Regular Expressions is a versatile tool for text processing. It helps to precisely define a matching criteria. For learning and understanding purposes, one can view regular expressions as a mini programming language in itself, specialized for text processing. Parts of a regular expression can be saved for future use, analogous to variables and functions. There are ways to perform AND, OR, NOT conditionals, features to concisely define repetition to avoid manual replication and so on.
Here's some common use cases.

  • Sanitizing a string to ensure that it satisfies a known set of rules. For example, to check if a given string matches password rules.

  • Filtering or extracting portions on an abstract level like alphabets, numbers, punctuation and so on.

  • Qualified string replacement. For example, at the start or the end of a string, only whole words, based on surrounding text, etc.

This chapter will cover regular expressions as implemented in awk. Most of awk's regular expression syntax is similar to Extended Regular Expression (ERE) found with grep -E and sed -E. Unless otherwise indicated, examples and descriptions will assume ASCII input.
See also POSIX specification for regular expressions. And unix.stackexchange: Why does my regular expression work in X but not in Y?

Syntax and variable assignment

As seen in previous chapter, the syntax is string ~ /regexp/ to check if the given string satisfies the rules specified by the regexp. And string !~ /regexp/ to invert the condition. By default, $0 is checked if the string isn't specified. You can also save a regexp literal in a variable by prefixing @ symbol. The prefix is needed because /regexp/ by itself would mean $0 ~ /regexp/.
$ printf 'spared no one\ngrasped\nspar\n' | awk '/ed/' spared no one grasped $ printf 'spared no one\ngrasped\nspar\n' | awk 'BEGIN{r = @/ed/} $0 ~ r' spared no one grasped

Download 125.91 Kb.

Share with your friends:
1   ...   5   6   7   8   9   10   11   12   ...   60

The database is protected by copyright ©ininet.org 2023
send message

    Main page