AWK: a powerful tool for programmer

2012-10-25
#awk

AWK is an acronym of first letters of its authors (Aho, Weinberger and Kernighan). It is a data-manipulating scripting language with huge possibilities. There are several implementations of it: awk is a canonical one, nawk (new awk), mawk (default in Ubuntu 12.04), gawk is GNU awk. I recommend latter one, because it works correct with unicode symbols in example:

$ echo юникод | gawk "{res = toupper(\$1); print res;}"
ЮНИКОД

§ Basic usage

Most useful feature is writing script files to be loaded in awk later. One can execute script file by

gawk [options] -f script_file.awk input_file

If there is no input file awk will read standard input stream.

Let’s take a look at an example. It reads input stream, writes down first argument to history, increases counter by 1 and prints “”. Code of script.awk:

#!/usr/bin/gawk -f
# BEGIN block executes only once after running awk
BEGIN {
    print "\nBegin printing args\n";
    i = 0;
}

# Main block executes for every argument
{
    i++;
    history[i] = $1;
    print i, $1;
    if ($1 == 0)
        exit(0);
}

# END block executes only once at finishing awk
END {
    print "\nArguments were: ";
    for (n=1; n<=i; ++n)
        print history[n]," ";
    print "\nEnd printing args\n"
}

Then, make script executable and run it:

$chmod +x ./script.awk
$./script.awk

Output will be like (“one”-enter, “cat”-enter, “dog"-enter, “0"-enter):

Begin printing args

one
1 one
cat
2 cat
dog
3 dog
0
4 0

Arguments were:
one
cat
dog
0

End printing args

Awk can be launched with script inline:

gawk [options] ''script_text'' file(s)

Example counts “block" words in code listed above:

awk "BEGIN{blocks=0} /block/{blocks++} END{ print blocks}" script.awk

Where /regular expression/ controls whether block after it will be executed.

§ User-defined functions

In awk user-defined functions can be added as follows:

#!/usr/bin/gawk -f

# returns sum of numbers
function sum(a, b, c) {
    res = a + b + c;

    return res;
}

# main program, for testing
{
    print sum($1, $2, $3);
}