You need a quick screen-based histogram of some data.
Use the associative arrays of awk, as discussed in the previous recipe:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
# # cookbook filename: hist.awk # function max(arr, big) { big = 0; for (i in user) { if (user[i] > big) { big=user[i];} } return big } NF > 7 { user[$3]++ } END { # for scaling maxm = max(user); for (i in user) { #printf "%s owns %d files\n", i, user[i] scaled = 60 * user[i] / maxm ; printf "%-10.10s [%8d]:", i, user[i] for (i=0; i<scaled; i++) { printf "#"; } printf "\n"; } } |
When we run it with the same input as the previous recipe, we get:
1 2 3 4 5 6 |
$ ls -lR /usr/local | awk -f hist.awk bin [ 68]:# albing [ 1801]:####### root [ 13755]:################################################## man [ 11491]:########################################## $ |
We could have put the code for max as the first code inside the END block, but we wanted to show you that you can define functions in awk.
We are using a bit of fancier printf. The string format %-10.10s will left justify and pad to 10 characters but also truncate at 10 characters.
The integer format %8d will assure that the integer is printed in an 8 character field.
This gives each histogram the same starting point, by using the same amount of space regardless of the username or the size of the integer.
Like all arithmetic in awk, the scaling calculation is done with floating point unless we explicitly truncate the result with a call to the built-in int( ) function.
We don’t do so, which means that the for loop will execute at least once, so that even the smallest amount of data will still display a single hash mark.
The order of data returned from the for (i in user) loop is in no particular order, probably based on some convenient ordering of the underlying hash table.
If you wanted the histogram displayed in a sorted order, either numeric by count or alphabetical
by username, you would have to add some sorting.
One way to do this is to break this program apart into two pieces, sending the output from the first part into the sort command and then piping that output into the second piece to print the histogram.