-------------------------------------------------------------------------------
Various Regex Expressions Techniques
-------------------------------------------------------------------------------
Matching Words

Basic methods...
    grep -w 'word'
    egrep "(^|\W)string($|\W)"     # NB: match includes surrounding characters
    egrep '\<word\>'
    grep -P '\bword\b'

BUT this does NOT always work!
  * Words make need to be case-insensitive
  * match whole word, not just part of word (boundaries)
  * They could contain hypens '-' and apostrophes '''
  * limit hyphens/apostrophes to the middle of word?

Boundary/hyphen example...
    openssl enc -ciphers |& grep -w 'aes-256'
  will match multiple times rather than only an exactly matching word

  Add boundarys then test
      openssl enc -ciphers |& sed '1d; s/^/ /; s/$/ /; s/   */ /g;' |
          grep ' -blowfish '

  seperate into one word per line
      openssl enc -ciphers |& tr -s ' ' '\012' |
          grep -x -- '-blowfish'

Consecutive words
  * the two words could be seperated by white space, including newlines
  * may have a posible end of line hyphen or become one hyphenated word
  * may even become a single word (no hyphen)!

  Solution, treat file as a single string (NUL char records)
    grep -Piz '\bROCKING-?\s*HORSE\b' file

-------------------------------------------------------------------------------
Match Delimited Text

This is difficult

|> The input file has a declaration something like the following with
|> several comments in a single line:
|>
|>    input a, b, /* comment  */ c, /* comment ******************* */ d;
|>
|> I need to delete the comments in between and write the declaration as
|>
|>    input a, b, c, d;
|>
|> NOTE: the c, must not be deleted.

It's a standard match-delimited-text problem, and the general solution is:

  1: match the opening delimiter
  2: match stuff that's not the closing delimiter
  3: match the closing delimiter

In this case, the opening delimiter is "/*" so the regex is "/\*".
The closing delimiter is "*/", so that regex is "\*/".

Stuff that's not the closing delimiter would be
        A) anything that's not /                  (regex "[^/]" )
   and  B) any / so long as it has no * before it (regex "[^*]/")

Combining them with an indication to say "as much as is there", we get:

        ([^/]|[^*]/)*

So the whole regex, wrapped in some perl, would be:

                               s#/\*([^/]|[^*]/)*\*/##g;
part number from above:          111222222222222233

---

Note that there's another way to conceptually look at the "stuff not the
closing delimiter". That'd be:
        A) anything not a *   (regex "[^*]")
   and  B) any * so long as it's not followed by a / (regex "\*[^/]")

That would lead to
        s#/\*([^*]|\*[^/])*\*/##g;

However, since the "\*[^/]" eats a character, it could eat the third * in
the string "/* commet **/" and we'd wedgie the regex and it wouldn't match.

The first way described above only eats characters we've already had a
chance to check aren't the ending, so it won't wedgie.

                               Jeffrey Friedl  <jfriedl@nff.ncl.omron.co.jp>

-------------------------------------------------------------------------------
Match string  "foo"  which is not followed by string  "bar"

[1]    ^foo$            match
[2]    ^foobar$         NOMATCH
[3]    ^foo foobar$     match first foo
[4]    ^fooba$          match
[5]    ^abc foobar$     NOMATCH

The complexity of this problem is that few programs have RE's that don't match
a group of characters. Thus a program with RE's and conditionals is required
to do this.

Perl 5 has such a negitive match EG: /foo(?!bar)/
construct (?X..)  where X is a control with the following meanings
    :   just match
    =   lookahead match
    !   lookahead negation
    #   comment (legible regepxs at last)
    a-z embedded option (like /i)
    \K  zero width back lookup  EG;    df | grep -o -P '% \K.*'

There are problems however
   should   'foo(?!ba.*r)bar'
   match    'foobazbar'

Once you use constructs like "*" inside a negative pattern, it often
gets complicated to figure out what the pattern really means.  But
perhaps we can still have negative patterns as long as we specify some
restrictions as to the interpretation or even use of things like "*"
inside those patterns.  ... ???
                                       Lloyd Zusman <ljz@panix.com>

Solutions:
Take the unwanted "foobar" out of the line before looking.
Then return the original line if a match is found

sed 'h
     s/foobar/+/g
     /foo/!d
     g'                     David W. Tamkin <dattier@MCS.COM>

-------------------------------------------------------------------------------