You need to sum a list of numbers, including numbers that don’t appear on lines by themselves.
Use awk both to isolate the field to be summed and to do the summing. Here we’ll sum up the numbers that are the file sizes from the output of an ls -l command:
1 |
$ ls -l | awk '{sum += $5} END {print sum}' |
We are summing up the fifth field of the ls -l output. The output of ls -l looks like this:
1 |
-rw-r--r-- 1 albing users 267 2005-09-26 21:26 lilmax |
and the fields are: permissions, links, owner, group, size (in bytes), date, time, and filename.
We’re only interested in the size, so we use $5 in our awk program to reference that field.
We enclose the two bodies of our awk program in braces ({}); note that there can be more than one body (or block) of code in an awk program.
A block of code preceded by the literal keyword END is only run once, when the rest of the program has finished.
Similarly, you can prefix a block of code with BEGIN and supply some code that will be run before any input is read.
The BEGIN block is useful for initializing variables, and we could have used one here to initialize sum, but awk guarantees that variables will start out empty.
If you look at the output of an ls -l command, you will notice that the first line is a total, and doesn’t fit our expected format for the other lines.
We have two choices for dealing with that.
We can pretend it’s not there, which is the approach taken above. Since that undesired line doesn’t have a fifth field, then our reference to $5 will be empty, and our sum won’t change.
The more conscientious approach would be to eliminate that field. We could do so before we give the output to awk by using grep:
1 |
$ ls -l | grep -v '^total' | awk '{sum += $5} END {print sum}' |
or we could do a similar thing within awk:
1 |
$ ls -l | awk '/^total/{getline} {sum += $5} END {print sum}' |
The ^total is a regular expression (regex); it means “the letters t-o-t-a-l occurring at the beginning of a line” (the leading ^ anchors the search to the beginning of a line).
For any line of input matching that regex, the associated block of code will be executed.
The second block of code (the sum) has no leading text, the absence of which tells awk to execute it for every line of input (meaning this will happen regardless of whether the line matches the regex).
Now, the whole point of adding the special case for “total” was to exclude such a line from our summing.
Therefore in the ^total block we add a getline command, which reads in the next line of input.
Thus, when the second block of code is reached, it is with a new line of input.
The getline does not re-match all the patterns from the top, only the ones from there on down. In awk programming, the order of the blocks of code matters.