When it comes to command line text processing, from an abstract point of view, there are three major pillars



Download 125.91 Kb.
Page60/60
Date09.03.2023
Size125.91 Kb.
#60849
1   ...   52   53   54   55   56   57   58   59   60
Learn GNU AWK

Faster execution


Changing locale to ASCII (assuming current locale is not ASCII and the input file has only ASCII characters) can give significant speed boost. Using mawk is another way to speed up execution, provided you are not using GNU awk specific features. Among other feature differences, mawk doesn't support {} form of quantifiers, see unix.stackexchange: How to specify regex quantifiers with mawk? for details. See also wikipedia: awk Versions and implementations.
$ # time shown is best result from multiple runs $ # speed benefit will vary depending on computing resources, input, etc $ # /usr/share/dict/words contains dictionary words, one word per line $ time awk '/^([a-d][r-z]){3}$/' /usr/share/dict/words > f1 real 0m0.029s $ time LC_ALL=C awk '/^([a-d][r-z]){3}$/' /usr/share/dict/words > f2 real 0m0.022s $ time mawk '/^[a-d][r-z][a-d][r-z][a-d][r-z]$/' /usr/share/dict/words > f3 real 0m0.009s $ # check that the results are same $ diff -s f1 f2 Files f1 and f2 are identical $ diff -s f2 f3 Files f2 and f3 are identical $ # clean up temporary files $ rm f[123]
Here's another example.
$ # count words containing exactly 3 lowercase 'a' $ time awk -F'a' 'NF==4{cnt++} END{print +cnt}' /usr/share/dict/words 1102 real 0m0.034s $ time LC_ALL=C awk -F'a' 'NF==4{cnt++} END{print +cnt}' /usr/share/dict/words 1102 real 0m0.023s $ time mawk -F'a' 'NF==4{cnt++} END{print +cnt}' /usr/share/dict/words 1102 real 0m0.014s

Further Reading


  • man awk and info awk and online manual

  • Information about various implementations of awk

    • awk FAQ — great resource, but last modified 23 May 2002

    • grymoire: awk tutorial — covers information about different awk versions as well

    • cheat sheet for awk/nawk/gawk

  • Q&A on stackoverflow/stackexchange are good source of learning material, good for practice exercises as well

    • awk Q&A on unix.stackexchange

    • awk Q&A on stackoverflow

  • Learn Regular Expressions (has information on flavors other than POSIX too)

    • regular-expressions — tutorials and tools

    • rexegg — tutorials, tricks and more

    • stackoverflow: What does this regex mean?

    • online regex tester and debugger — not fully suitable for cli tools, but most of the POSIX syntax works

  • My repo on cli text processing tools

  • Related tools

  • miscellaneous

    • unix.stackexchange: When to use grep, sed, awk, perl, etc

    • awk-libs — lots of useful functions

    • awkaster — Pseudo-3D shooter written completely in awk

    • awk REPL — live editor on browser

  • ASCII reference and locale usage

    • ASCII code table

    • wiki.archlinux: locale

    • shellhacks: Define Locale and Language Settings

  • examples for some of the topics not covered in this book

    • unix.stackexchange: rand/srand

    • unix.stackexchange: strftime

    • stackoverflow: arbitrary precision integer extension

    • stackoverflow: recognizing hexadecimal numbers

    • unix.stackexchange: sprintf and file close

    • unix.stackexchange: user defined functions and array passing

    • unix.stackexchange: rename csv files based on number of fields in header row

Download 125.91 Kb.

Share with your friends:
1   ...   52   53   54   55   56   57   58   59   60




The database is protected by copyright ©ininet.org 2024
send message

    Main page