------------------------------------------------------------------------------- Various Regex Expressions Techniques ------------------------------------------------------------------------------- Matching Words Basic methods... grep -w 'word' egrep "(^|\W)string($|\W)" # NB: match includes surrounding characters egrep '\' grep -P '\bword\b' BUT this does NOT always work! * Words make need to be case-insensitive * match whole word, not just part of word (boundaries) * They could contain hypens '-' and apostrophes ''' * limit hyphens/apostrophes to the middle of word? Boundary/hyphen example... openssl enc -ciphers |& grep -w 'aes-256' will match multiple times rather than only an exactly matching word Add boundarys then test openssl enc -ciphers |& sed '1d; s/^/ /; s/$/ /; s/ */ /g;' | grep ' -blowfish ' seperate into one word per line openssl enc -ciphers |& tr -s ' ' '\012' | grep -x -- '-blowfish' Consecutive words * the two words could be seperated by white space, including newlines * may have a posible end of line hyphen or become one hyphenated word * may even become a single word (no hyphen)! Solution, treat file as a single string (NUL char records) grep -Piz '\bROCKING-?\s*HORSE\b' file ------------------------------------------------------------------------------- Match Delimited Text This is difficult |> The input file has a declaration something like the following with |> several comments in a single line: |> |> input a, b, /* comment */ c, /* comment ******************* */ d; |> |> I need to delete the comments in between and write the declaration as |> |> input a, b, c, d; |> |> NOTE: the c, must not be deleted. It's a standard match-delimited-text problem, and the general solution is: 1: match the opening delimiter 2: match stuff that's not the closing delimiter 3: match the closing delimiter In this case, the opening delimiter is "/*" so the regex is "/\*". The closing delimiter is "*/", so that regex is "\*/". Stuff that's not the closing delimiter would be A) anything that's not / (regex "[^/]" ) and B) any / so long as it has no * before it (regex "[^*]/") Combining them with an indication to say "as much as is there", we get: ([^/]|[^*]/)* So the whole regex, wrapped in some perl, would be: s#/\*([^/]|[^*]/)*\*/##g; part number from above: 111222222222222233 --- Note that there's another way to conceptually look at the "stuff not the closing delimiter". That'd be: A) anything not a * (regex "[^*]") and B) any * so long as it's not followed by a / (regex "\*[^/]") That would lead to s#/\*([^*]|\*[^/])*\*/##g; However, since the "\*[^/]" eats a character, it could eat the third * in the string "/* commet **/" and we'd wedgie the regex and it wouldn't match. The first way described above only eats characters we've already had a chance to check aren't the ending, so it won't wedgie. Jeffrey Friedl ------------------------------------------------------------------------------- Match string "foo" which is not followed by string "bar" [1] ^foo$ match [2] ^foobar$ NOMATCH [3] ^foo foobar$ match first foo [4] ^fooba$ match [5] ^abc foobar$ NOMATCH The complexity of this problem is that few programs have RE's that don't match a group of characters. Thus a program with RE's and conditionals is required to do this. Perl 5 has such a negitive match EG: /foo(?!bar)/ construct (?X..) where X is a control with the following meanings : just match = lookahead match ! lookahead negation # comment (legible regepxs at last) a-z embedded option (like /i) \K zero width back lookup EG; df | grep -o -P '% \K.*' There are problems however should 'foo(?!ba.*r)bar' match 'foobazbar' Once you use constructs like "*" inside a negative pattern, it often gets complicated to figure out what the pattern really means. But perhaps we can still have negative patterns as long as we specify some restrictions as to the interpretation or even use of things like "*" inside those patterns. ... ??? Lloyd Zusman Solutions: Take the unwanted "foobar" out of the line before looking. Then return the original line if a match is found sed 'h s/foobar/+/g /foo/!d g' David W. Tamkin -------------------------------------------------------------------------------