Changing locale to ASCII (assuming current locale is not ASCII and the input file has only ASCII characters) can give significant speed boost. Using mawk is another way to speed up execution, provided you are not using GNU awk specific features. Among other feature differences, mawk doesn't support {}form of quantifiers, see unix.stackexchange: How to specify regex quantifiers with mawk? for details. See also wikipedia: awk Versions and implementations.
$ # time shown is best result from multiple runs $ # speed benefit will vary depending on computing resources, input, etc $ # /usr/share/dict/words contains dictionary words, one word per line $ time awk '/^([a-d][r-z]){3}$/' /usr/share/dict/words > f1 real 0m0.029s $ time LC_ALL=C awk '/^([a-d][r-z]){3}$/' /usr/share/dict/words > f2 real 0m0.022s $ time mawk '/^[a-d][r-z][a-d][r-z][a-d][r-z]$/' /usr/share/dict/words > f3 real 0m0.009s $ # check that the results are same $ diff -s f1 f2 Files f1 and f2 are identical $ diff -s f2 f3 Files f2 and f3 are identical $ # clean up temporary files $ rm f[123]
Here's another example.
$ # count words containing exactly 3 lowercase 'a' $ time awk -F'a' 'NF==4{cnt++} END{print +cnt}' /usr/share/dict/words 1102 real 0m0.034s $ time LC_ALL=C awk -F'a' 'NF==4{cnt++} END{print +cnt}' /usr/share/dict/words 1102 real 0m0.023s $ time mawk -F'a' 'NF==4{cnt++} END{print +cnt}' /usr/share/dict/words 1102 real 0m0.014s