------------------------------------------------------------------------------- Match Delimited Text Remove C comments.... |> The input file has a declaration something like the following with |> several comments in a single line: |> |> input a, b, /* comment */ c, /* comment ******************* */ d; |> |> I need to delete the comments in between and write the declaration as |> |> input a, b, c, d; |> |> NOTE: the c, must not be deleted. It's a standard match-delimited-text problem, and the general solution is: 1: match the opening delimiter 2: match stuff that's not the closing delimiter 3: match the closing delimiter In this case, the opening delimiter is "/*" so the regex is "/\*". The closing delimiter is "*/", so that regex is "\*/". Stuff that's not the closing delimiter would be A) anything that's not / (regex "[^/]" ) and B) any / so long as it has no * before it (regex "[^*]/") Combining them with an indication to say "as much as is there", we get: ([^/]|[^*]/)* So the whole regex, wrapped in some perl, would be: s#/\*([^/]|[^*]/)*\*/##g; Parts number from above: 1112222222222222333 AAAA BBBB Note that there's another way to conceptually look at the "stuff not the closing delimiter". That'd be: A) anything not a * (regex "[^*]") and B) any * so long as it's not followed by a / (regex "\*[^/]") That would lead to s#/\*([^*]|\*[^/])*\*/##g; However, since the "\*[^/]" eats a character, it could eat the third * in the string "/* commet **/" and we'd wedgie the regex and it wouldn't match. The first way described above only eats characters we've already had a chance to check aren't the ending, so it won't wedgie. Jeffrey Friedl ------------------------------------------------------------------------------- Spliting Quote Delimited Fields (CSV) EG: spliting CVS data (a comma delimited file with quoted fields) Example: SAR001,"","Cimetrix, Inc","Bob Smith","CAM","\"",N,8,,"Error, Core Dumped" undef @fields; push( @fields, defined($1) ? $1 : $3) while m/"([^"\\]*(\\.[^"\\]*)*)"|([^,]+)/g; WARNING the above does not seem to work under perl 5 -- Anthony Jeffrey Friedl, author of Mastering Regular Expressions gives... @new = (); push(@new, $+) while $text =~ m{ "([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes | ([^,]+),? | , }gx; push(@new, undef) if substr($text,-1,1) eq ','; However quotes within a quoted field needs to be backslashed using \" Alternatively, use Text::ParseWords use Text::ParseWords; @new = quotewords(",", 0, $text); For space separated words such as for a shell command EG: cp -p "my file" "yourfile" you can look at... perl4: shellwords.pl library, perl5: Text::ParseWords module perl -de 1 use Text::ParseWords $line = 'cp -p "my file" "your file"' @words = shellwords $line X words @words = ( 0 'cp' 1 '-p' 2 'my file' 3 'your file' ) Perl 4 Alturnatives... Method 1: # delimit ',' with quoted strings and variable allowed $_ = 'f1,f 2,"f3","f,4",5,$time,f7'; while (/,|"|$/go) { ($within = ($within ? 0 : 1), next) if '"' eq $&; next if $within; substr($_, 0, length($`)+1) = ""; push(@fields, $`); } print join(" ", @fields),"\n"; output f1 f 2 "f3" "f,4" 5 $time f7 Method 2: Just remove the delimiter ',' from within quotes s/("[^"]*")/do{$a = $1; $a =~ tr#,#c#; $a;}/ge; now you can split the line as you would normally If you substitute the ',' with a unused char you can restore it later! -------------------------------------------------------------------------------