-----------------------------------------------------------------------------
One liners

  Tee-pipe  (pipe output into two commands)
  command_source | awk '{print; print | "command1"}' | command2

  Uniq (consecutive lines)
    awk 'a != $0; { a = $0 }'

  Uniq without sorting - also see "info shell/file.txt" (not solaris)
    awk ' !x[$0]++'

  More efficent uniq without sort
    awk '!($0 in a) { a[$0]; print }'

-----------------------------------------------------------------------------
Simple and Common AWK Examples
(data is a biome count in minecraft)

145174951280  Ocean
153013367088  Ocean
48860644080   Desert
13567315200   Desert
2053622912    Desert
4627422256    Forest
89607678496   Forest
24159066112   Forest
6246795152    Ice
19021276480   Ice
859479312     Ice
11721673776   Jungle
982828512     Jungle
3917632       Jungle
4059119632    Jungle
526812784     Jungle
4653212096    Mesa
259682528     Mesa
1088787328    Mesa
2516103104    Mesa
132590528     Mesa
56755600      Mesa
305219136     Mushroom
206617808     Mushroom

# ------------
#
# Math on a single column...
#
Maximum Value of all values
  awk 'BEGIN { m=-1e8 } $1 > m { m=$1; i=$2 } END { print m,i; }'

Minimum Value of all values
  awk 'BEGIN { m=1e8 } $1 < m { m=$1; i=$2 } END { print m,i; }'

Total of column 1
  awk '{t+=$1} END { print t "\n"}' \

Average of column 1
  awk '{t+=$1} END { print t "/" NR " => " t/NR "\n"}' \

Median Value
This is tricky as you need to sort and store, so you can get the 'center'
  sort -n | awk '{arr[NR]=$1}
   END { if (NR%2==1) print arr[(NR+1)/2];
         else         print (arr[NR/2]+arr[NR/2+1])/2}
  '

# ------------
#
# Math by unique items (output order not important)
#
Count Unique Values in Column 2 (without using a slow sort|uniq pipline)
  awk '{a[$2]++} END {for(i in a) print i, a[i]}'

Minumum of a specific item
  ????

Maximum of a specific item
  ????

Add Up the values of same item
  awk '{a[$2]+=$1} END {OFS="\t"; for(i in a) print a[i], i}'

Find Largest Added-up Count
  awk '{a[$2]+=$1} END {for (i in a) if (a[i]>a[m]) m=i; print a[m], m}'

Percentage of Total
  awk '{a[$2] += $1; t+=$1}
       END { OFS="\t";
         for(i in a) print a[i], i, sprintf("%6.2f%%", 100*a[i]/t);
         print "-------------------";
         print t, "Total", "100.00%";
       }'

Sort by Item
  ???

Sort by Added Count
  ???

Preserve original order of values
  ???

# ------------
#
# You can also use Gnu "datamash" but it is not a standard install
# Though you may have to use a proper field seperator character
#
Minimum value
  datamash -t\  min 1

Comma seperated list of unique names
  sed 's/ \+/\t/g' | datamash unique 2

  Desert,Forest,Ice,Jungle,Mesa,Mushroom,Ocean


get count,min,max,sum values grouped by second column
  sed 's/ \+/\t/g' | datamash -g 2 count 1 min 1 max 1 sum 1 | column -t

  Ocean     2  145174951280  153013367088  298188318368
  Desert    3  2053622912    48860644080   64481582192
  Forest    3  4627422256    89607678496   118394166864
  Ice       3  859479312     19021276480   26127550944
  Jungle    5  3917632       11721673776   17294352336
  Mesa      6  56755600      4653212096    8707131184
  Mushroom  2  206617808     305219136     511836944

-----------------------------------------------------------------------------
Variable Settings   and
Awk processing two files differently

This could be used to read a config or a table of data before the data
to awk.

Variable settings take effect when they are encountered on the command line,
so, for example, you could instruct awk on how to behave for different files
using this technique.  For example:

  awk 'data==0 { print "process config_data here"  }
       data==1 { print "process data_file here" }
      ' config_data data=1 data_file

Note that some versions of awk will cause variable settings encountered
before any real filenames to take effect before the BEGIN block is
executed, but some won't so neither way should be relied upon.

Solaris 9: you must use "nawk" for this to work.

-------------------------------------------------------------------------------
Using/modifying Awk Arguments (in BEGIN{})

During the BEGIN{..} code block, ARGV has not been looked at, but has
been read into an array.  As such you can look through it an act or
modify on those variables.

  =======8<--------
  awk 'BEGIN { print "data = " data
               for (i=0;i<ARGC;i++) print "argv[" i "] = " ARGV[i]
       }' data=2 abc

  data =
  argv[0] = awk
  argv[1] = data=2
  argv[2] = abc
  =======8<--------

Note data has not been set in the BEGIN block. Even though the argument array
has been defined in memory.   Variables assigned using a -v option however
is available to the BEGIN block.

  awk -v data=3 'BEGIN { print "data = " data }'
  data = 3

If you don't want to process some file (or modify what file to process),
you can modify the ARGV array in the BEGIN block.   For example
extract an element and remove it from the file list to be processed...

  =======8<--------
  #!/bin/nawk -f
  BEGIN { awkoption = ARGV[1]; delete ARGV[1] }
  ...
  =======8<--------

NOTE: if delete ARGV[1] fails try ARGV[1]="" or "/dev/null"


Solaris 9:
   "awk" attempts to open all given files before BEGIN (error if not present)
   Also data assignment has not been done before BEGIN!
   "awk" also does not have access to -v option or to "ARGV" array.
   However "nawk" works as documented above!

-----------------------------------------------------------------------------
Array Example

5212 07/21
3589 07/22
9999 07/22
44209 07/23
...

  awk '{ day[$2] += $1;                     # record daily total
         m=substr($2,0,2); month[m] += $1;  # record monthy total
       }
       END {
         for (i in day)
           print i, day[i]
         print "---"
         for (i in month)
           print i, month[i]
       }' data_file

-----------------------------------------------------------------------------
Variables in Regular Expressions

You cannot use a variable inside a regular expression.
But you can use the whole variable as the regular expresion.

  awk '$0 ~ RegExp  { print $0 }' RegExp="^sock..$" /usr/share/dict/words

  socked
  socker
  socket

-----------------------------------------------------------------------------
Multi-dimentional Arrays are possible in awk

  # found that bitmap[$0] would not work.
  # so assign to variables.
  coord_x = $1+0
  coord_y = $2+0
  # assign "X" to that element of array
  bitmap[coord_x,coord_y] = "X"

-----------------------------------------------------------------------------
Reading from commands in awk

The trick is to CLOSE the file that you use for the command pipe

    "date" | getline x;   close "date";

-----------------------------------------------------------------------------
Multiple Value returns from awk...

To do this have awk output a shell script. This can then be evaluated!
And you can generate as many variables as you like. (C-programs do this too)

# WARNING THIS IS BAD SECURITY

eval `awk 'END { print "FLAG="(NR > 0) ? 1 : 0 }' -`

echo "FLAG=$FLAG"

-----------------------------------------------------------------------------
GAwk field separator

The newer gnu-awk allows you to specify multi character and regular expresion
field seperators.

  awk -F"[\"><]+" '....'

When a record is seperated Gawk sets RT to the trailing text of the record
that RS matched.


Paragraph Records....

A empty string  RS=""  is a 'blank line' as a field seperator
Leading blank lines in a file is ignored.

If the FS field is also set to a single character the new line character
will also act as a field seperator even if not specified.  Typical useage is
  RS="";  FS=" ";
If you want to avoid this convert the single FS character into a Regex.


For multi-blank line paragraphs use  RS="\n\n+"
However leading blank lines at the top are then not ignored, so may need
special handling.

-----------------------------------------------------------------------------
Changing the record seperator inside an awk script.

Awk reads and record separates its input before the script properly starts
as such you need to get it to re-evaluate that input.

For example...
    awk 'BEGIN { RS=";" }
         { $1=$1; printf "%s%1s\n",$0,";" }
        ' infile

The $1=$1 is required to get awk to re-compute the $0 for the first record.

-----------------------------------------------------------------------------
Determine column numbers of relevant 'ps' fields

  =======8<--------CUT HERE----------axes/crowbars permitted---------------
  #!/bin/sh
  CMDCOL=`ps -e | awk '
          NR == 1 { for (i = 1; i <= NF; i++)
                      if ($i == "COMMAND" || $i == "CMD" || $i == "COMD")
                        cmdcol = i
                  }
          END     { print cmdcol }
  '`

  if [ "$CMDCOL" = "" ]
  then
          echo "$0: Unrecognised ps format for COMMAND field"
          exit 1
  fi

  TTYCOL=`ps -e | awk '
          NR == 1 { for (i = 1; i <= NF; i++)
                      if ($i == "TTY")
                        ttycol = i
                  }
          END     { print ttycol }
  '`

  if [ "X$TTYCOL" = "X" ]; then
      echo "$0: Unrecognised ps format for TTY field"
      exit 1
  fi

  #
  # Print list of all terminals running program
  #
  echo -n "Terminals running vim : "
  ps -e | awk ' $'"$CMDCOL"' == "vim" { print $'"$TTYCOL"' }'

  =======8<--------CUT HERE----------axes/crowbars permitted---------------

-----------------------------------------------------------------------------