View Single Post
  #5   (View Single Post)  
Old 2nd December 2008
J65nko J65nko is offline
Administrator
 
Join Date: May 2008
Location: Budel - the Netherlands
Posts: 4,128
Default

I rather write regular expressions that escape regular expressions, then write escaped regular expressions.
Code:
$ cat testfile                                                           
the quick brown fox jumps over the lazy dog
<?php @include("http://".$_SERVER['HTTP_HOST']."/linkingblogv.php"); ?>

$ cat escape-regex                                                       
#!/bin/sh

# -- Use here-document with single quoted end-of-document marker
# -- This prevents the shell from messing with any character

pattern=$(cat <<'END'
<?php @include("http://".$_SERVER['HTTP_HOST']."/linkingblogv.php"); ?>
END
)

echo "This is the pattern:\n$pattern"

# ---- For BRE (basic regular expressions) like in sed(1)
# escape everything except:
#   * ( and ) because '\(' and '\)' capture text in sed(1)
#   * { and } because '\{' and '\}' are used to specify minimum and/or max
#   * 'n' because '\n' is the shell symbol for <newline>
#   * alphabetic characters
#   * whitespace
#   * digits

pattern_esc=$(echo "$pattern" | sed -e 's!\([^(){}n]\)!\\\1!g')
pattern_esc=$(echo "$pattern" | sed -e 's!\([^(){}[:alpha:]]\)!\\\1!g')
pattern_esc=$(echo "$pattern" | sed -e 's!\([^(){}[:alpha:][:blank:]]\)!\\\1!g')
pattern_esc=$(echo "$pattern" | sed -e 's!\([^(){}[:alpha:][:blank:][:digit:]]\)!\\\1!g')

#       
#       s               : start search pattern
#       !               : our custom delimiter
#       
#       \(              : start capture in container '\1'
#       
#       [               : start of character class
#       ^               : negate characters in this class
#       ()              : '(' and ')'
#       {}              : '{' and '}'
#       [:alpha:]       : alphabetic character class
#       [:blank:]       : whitespace character class
#       [:digit:]       : numeric character class
#       
#       \)              : end of capture in container '\1'
#       
#       !               : end of search pattern, start of replacement
#       
#       \\              : a literal '\' escaped with itself
#       \1              : contents of container '\1'
#       
#       !               : end of replacement
#       g               : do a 'g'lobal search and replace, not only first match
#       

echo "\n===========The escaped pattern====================="
echo "$pattern_esc"

echo "\nDoing a grep on 'testfile'"
grep -n "$pattern_esc" testfile

echo "\nUsing sed(1) to replace the pattern with 'GORILLA'"
sed -e "s/${pattern_esc}/GORILLA/" testfile
$ ./escape-regex    
This is the pattern:
<?php @include("http://".$_SERVER['HTTP_HOST']."/linkingblogv.php"); ?>

===========The escaped pattern=====================
\<\?php \@include(\"http\:\/\/\"\.\$\_SERVER\[\'HTTP\_HOST\'\]\.\"\/linkingblogv\.php\")\; \?\>

Doing a grep on 'testfile'
2:<?php @include("http://".$_SERVER['HTTP_HOST']."/linkingblogv.php"); ?>

Using sed(1) to replace the pattern with 'GORILLA'
the quick brown fox jumps over the lazy dog
GORILLA
My first attempt was a brute force approach to just escape everything.That succeeded for grep, but failed for sed. Probably because a "n" became a "\n", the newline symbol.

Then I refined the pattern bit by bit, as you can see from the successive definitions of "pattern_esc"

Another approach would be to escape all regular expression symbols. But that is left as exercise for the reader
Attached Files
File Type: sh escape-regex.sh (1.7 KB, 67 views)
__________________
You don't need to be a genius to debug a pf.conf firewall ruleset, you just need the guts to run tcpdump
Reply With Quote