1. Script to test whether an IP address has been listed in a DNSBL- 1.1 Introduction
- 1.2 Manual DNSBL list check
- 1.3 The 'blcheck' shell script
- 1.4 Examples of correct and incorrect usage
- 1.5 The role of 'sed'
- 1.6 Explanation of the 'sed' regular expression
- 1.7 Alternative approach for this rather long regex
1.1 Introduction
If you run a mail or a web server it nice to know in time whether the IP address of your server has been submitted to a so-called DNSBL list. Being listed can mean that one of your network boxes, or that a site you host on your webserver, has been compromised and is sending out spam.
Many administrators find out the hard way, that their server has been blacklisted. Customers or users complain about their mail not being accepted by their recipients. Checking the mail logs then usually reveals an pointer to an URL which states something like this
Code:
IP Address 1.2.3.4 was found in the CBL.
It was detected at 2007-06-16 20:00 GMT (+/- 30 minutes), approximately
8 hours ago.
ATTENTION: This IP has an open web or socks proxy which is being
hijacked by the 'DMS' spam tool to send spam. This is usually due
to proxy trojans being installed on your IP (or a machine "behind"
this IP if it is a NAT gateway) via the vulnerabilities described
in the Microsoft MS06-040 security bulletin. Please see the top
news item on our home page for more information.
You need to patch your system, find then fix/remove the proxy, and
then contact the CBL at xxxxx@xxxx.org to remove this listing.
This is why every responsible system administrator should check on a regular basis for being listed.
See
http://en.wikipedia.org/wiki/DNSBL and the excellent
http://www.spamhaus.org/dnsbl_function.html page for more information about these lists and their role in anti-spam policies.
1.2 Manual DNSBL list check
The organizations maintaining these lists, have a page on their website where you can check if your server is on their list. For example
http://www.spamhaus.org/lookup.lasso and
http://www.spamcop.net/bl.shtml.
For a regular check however, for example to be run by
'cron', these facilities are not really helpful. And for a manual check of the IP address from the
'sh' command line you have to do quite some work too. Take as example the IP address
125.175.43.40 that sent me a spam message. For a black list check of this address you have to perform the following steps:
- Reverse the address 125.175.43.40 to 40.43.175.125.
- Append the name of the blacklist.
For the 'zen.spamhaus' list, that results in '40.43.175.125.zen.spamhaus.org'
- Resolve the resulting name in DNS with a DNS tool
Code:
$ dig 40.43.175.125.zen.spamhaus.org
; <<>> DiG 9.3.2-P1 <<>> 40.43.175.125.zen.spamhaus.org
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 11694
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;40.43.175.125.zen.spamhaus.org. IN A
;; ANSWER SECTION:
40.43.175.125.zen.spamhaus.org. 1800 IN A 127.0.0.4
;; Query time: 384 msec
;; SERVER: 192.168.222.10#53(192.168.222.10)
;; WHEN: Sat Jun 16 13:42:10 2007
;; MSG SIZE rcvd: 64
The 127.0.0.4 results means the address is on that list.
A similar test but now for Spamcop:
Code:
$ dig 40.43.175.125.bl.spamcop.net
; <<>> DiG 9.3.2-P1 <<>> 40.43.175.125.bl.spamcop.net
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 16727
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;40.43.175.125.bl.spamcop.net. IN A
;; ANSWER SECTION:
40.43.175.125.bl.spamcop.net. 2100 IN A 127.0.0.2
;; Query time: 141 msec
;; SERVER: 192.168.222.10#53(192.168.222.10)
;; WHEN: Sat Jun 16 13:58:33 2007
;; MSG SIZE rcvd: 62
Again the response is an address in the loopback 127.0.0.0/8 range, meaning it has been listed.
If you are familiar with reverse name lookups, you will noticed that the same mechanism is used here. Instead of appending '.in-addr.arpa.' to the reversed IP, you use the name of the black list.
1.3 The 'blcheck' shell script
In the beginning of 2007 I saw an increase of spam in my Gmail spam folders. Because I wanted a comfortable way to find out whether this junk originated from black listed senders, I wrote the following script.
Code:
#!/bin/sh
# -- $Id: blcheck.xml,v 1.8 2007/06/17 23:38:00 j65nko Exp $ --
# Check if an IP address is listed on one of the following blacklists
# The format is chosen to make it easy to add or delete
# The shell will strip multiple whitespace
BLISTS="
cbl.abuseat.org
dnsbl.sorbs.net
bl.spamcop.net
zen.spamhaus.org
combined.njabl.org
"
# simple shell function to show an error message and exit
# $0 : the name of shell script, $1 is the string passed as argument
# >&2 : redirect/send the message to stderr
ERROR() {
echo $0 ERROR: $1 >&2
exit 2
}
# -- Sanity check on parameters
[ $# -ne 1 ] && ERROR 'Please specify a single IP address'
# -- if the address consists of 4 groups of minimal 1, maximal digits, separated by '.'
# -- reverse the order
# -- if the address does not match these criteria the variable 'reverse will be empty'
reverse=$(echo $1 |
sed -ne "s~^\([0-9]\{1,3\}\)\.\([0-9]\{1,3\}\)\.\([0-9]\{1,3\}\)\.\([0-9]\{1,3\}\)$~\4.\3.\2.\1~p")
if [ "x${reverse}" = "x" ] ; then
ERROR "IMHO '$1' doesn't look like a valid IP address"
exit 1
fi
# Assuming an IP address of 11.22.33.44 as parameter or argument
# If the IP address in $0 passes our crude regular expression check,
# the variable ${reverse} will contain 44.33.22.11
# In this case the test will be:
# [ "x44.33.22.11" = "x" ]
# This test will fail and the program will continue
# An empty '${reverse}' means that shell argument $1 doesn't pass our simple IP address check
# In that case the test will be:
# [ "x" = "x" ]
# This evaluates to true, so the script will call the ERROR function and quit
# -- do a reverse ( address -> name) DNS lookup
REVERSE_DNS=$(dig +short -x $1)
echo IP $1 NAME ${REVERSE_DNS:----}
# -- cycle through all the blacklists
for BL in ${BLISTS} ; do
# print the UTC date (without linefeed)
printf $(env TZ=UTC date "+%Y-%m-%d_%H:%M:%S_%Z")
# show the reversed IP and append the name of the blacklist
printf "%-40s" " ${reverse}.${BL}."
# use dig to lookup the name in the blacklist
#echo "$(dig +short -t a ${reverse}.${BL}. | tr '\n' ' ')"
LISTED="$(dig +short -t a ${reverse}.${BL}.)"
echo ${LISTED:----}
done
# --- EOT ------
The script has been rather heavily commented and is available for downloading. The regular expression used by
'sed' will be explained in detail in another section.
1.4 Examples of correct and incorrect usage
Correct
Code:
$ ./blcheck 125.175.43.40
IP 125.175.43.40 NAME p4040-ipbf1108marunouchi.tokyo.ocn.ne.jp.
2007-06-17_01:11:05_UTC 40.43.175.125.cbl.abuseat.org. 127.0.0.2
2007-06-17_01:11:06_UTC 40.43.175.125.dnsbl.sorbs.net. ---
2007-06-17_01:11:07_UTC 40.43.175.125.bl.spamcop.net. 127.0.0.2
2007-06-17_01:11:07_UTC 40.43.175.125.zen.spamhaus.org. 127.0.0.4
2007-06-17_01:11:12_UTC 40.43.175.125.combined.njabl.org. ---
$ ./blcheck 80.100.2.99
IP 80.100.2.99 NAME fia99-2-100.dsl.mxposure.nl.
2007-06-17_21:01:42_UTC 99.2.100.80.cbl.abuseat.org. ---
2007-06-17_21:01:42_UTC 99.2.100.80.dnsbl.sorbs.net. ---
2007-06-17_21:01:42_UTC 99.2.100.80.bl.spamcop.net. ---
2007-06-17_21:01:43_UTC 99.2.100.80.zen.spamhaus.org. ---
2007-06-17_21:01:43_UTC 99.2.100.80.combined.njabl.org. ---
$ for X in 24.209.96.220 124.160.89.56; do ./blcheck $X; done
IP 24.209.96.220 NAME cpe-24-209-96-220.woh.res.rr.com.
2007-06-17_01:18:29_UTC 220.96.209.24.cbl.abuseat.org. 127.0.0.2
2007-06-17_01:18:29_UTC 220.96.209.24.dnsbl.sorbs.net. 127.0.0.10
2007-06-17_01:18:30_UTC 220.96.209.24.bl.spamcop.net. 127.0.0.2
2007-06-17_01:18:30_UTC 220.96.209.24.zen.spamhaus.org. 127.0.0.11 127.0.0.4
2007-06-17_01:18:30_UTC 220.96.209.24.combined.njabl.org. 127.0.0.3
IP 124.160.89.56 NAME ---
2007-06-17_01:18:31_UTC 56.89.160.124.cbl.abuseat.org. 127.0.0.2
2007-06-17_01:18:31_UTC 56.89.160.124.dnsbl.sorbs.net. ---
2007-06-17_01:18:31_UTC 56.89.160.124.bl.spamcop.net. 127.0.0.2
2007-06-17_01:18:31_UTC 56.89.160.124.zen.spamhaus.org. 127.0.0.11 127.0.0.4
2007-06-17_01:18:31_UTC 56.89.160.124.combined.njabl.org. 127.0.0.3
$ while true; do echo IP?; read IP; ./blcheck $IP; done
IP?
201.13.22.241
IP 201.13.22.241 NAME 201-13-22-241.dsl.telesp.net.br.
2007-06-17_23:12:10_UTC 241.22.13.201.cbl.abuseat.org. 127.0.0.2
2007-06-17_23:12:11_UTC 241.22.13.201.dnsbl.sorbs.net. 127.0.0.10
2007-06-17_23:12:11_UTC 241.22.13.201.bl.spamcop.net. 127.0.0.2
2007-06-17_23:12:11_UTC 241.22.13.201.zen.spamhaus.org. 127.0.0.11 127.0.0.4
2007-06-17_23:12:11_UTC 241.22.13.201.combined.njabl.org. 127.0.0.3
IP?
67.133.212.132
IP 67.133.212.132 NAME ;; connection timed out; no servers could be reached
2007-06-17_23:14:32_UTC 132.212.133.67.cbl.abuseat.org. 127.0.0.2
2007-06-17_23:14:33_UTC 132.212.133.67.dnsbl.sorbs.net. ---
2007-06-17_23:14:34_UTC 132.212.133.67.bl.spamcop.net. 127.0.0.2
2007-06-17_23:14:34_UTC 132.212.133.67.zen.spamhaus.org. 127.0.0.4
2007-06-17_23:14:34_UTC 132.212.133.67.combined.njabl.org. ---
IP?
^C
Incorrect and error message
Code:
$ ./blcheck
./blcheck ERROR: Please specify a single IP address
$ ./blcheck 1.2.3.4]
./blcheck ERROR: IMHO '1.2.3.4]' doesn't look like a valid IP address
$ ./blcheck 1.2.3.4 5.7.7.8
./blcheck ERROR: Please specify a single IP address
Incorrect (invalid octet 400), but no error message
Code:
$ ./blcheck 125.175.43.400
IP 125.175.43.400 NAME ---
2007-06-17_01:29:03_UTC 400.43.175.125.cbl.abuseat.org. ---
2007-06-17_01:29:03_UTC 400.43.175.125.dnsbl.sorbs.net. ---
2007-06-17_01:29:04_UTC 400.43.175.125.bl.spamcop.net. ---
2007-06-17_01:29:04_UTC 400.43.175.125.zen.spamhaus.org. ---
2007-06-17_01:29:04_UTC 400.43.175.125.combined.njabl.org. ---
1.5 The role of 'sed'
To reverse the IP address the script uses
'sed'. The man page tersely describes this program as follows:
Code:
DESCRIPTION
The sed utility reads the specified files, or the standard input if no
files are specified, modifying the input as specified by a list of com-
mands. The input is then written to the standard output.
A single command may be specified as the first argument to sed. Multiple
commands may be specified separated by newlines or semicolons, or by us-
ing the -e or -f options. All commands are applied to the input in the
order they are specified regardless of their origin.
'sed' is one of the many text processing utilities which acts as a filter. It takes input, applies some commands to that input and sends the result to the standard output.
Code:
reverse=$(echo $1 |
sed -ne "s~^\([0-9]\{1,3\}\)\.\([0-9]\{1,3\}\)\.\([0-9]\{1,3\}\)\.\([0-9]\{1,3\}\)$~\4.\3.\2.\1~p")
'reverse' is a variable which we fill with the output of a command.
Code:
reverse=$( command )
The '$(
command )' construct is called command substitution and is the preferred alternative for the older construct which uses backticks:
The command in this case is
echo $1 | sed -ne "......."
$1 is the IP address passed as argument to the
'blcheck' script and is echoed to standard output. The '|' pipe symbol causes it to be fed to
'sed' as standard input for processing.
The options used in calling
'sed':
Code:
-n By default, each line of input is echoed to the standard output
after all of the commands have been applied to it. The -n option
suppresses this behavior.
-e command Append the editing commands specified by the command argument to
the list of commands.
We only want to echo the reversed IP if the regular expression matches But that means we have to use the
'sed' 'p' command to force a print if the match and reversal has been successful.
1.6 Explanation of the 'sed' regular expression
Code:
s~^\([0-9]\{1,3\}\)\.\([0-9]\{1,3\}\)\.\([0-9]\{1,3\}\)\.\([0-9]\{1,3\}\)$~\4.\3.\2.\1~p"
Starting at beginning of string '^' and starting storage in '\1', look for a sequence of at least 1, at most 3 digits, and store or remember in container \1.
Then do something similar for the remainder of the IP address 3 times: Look for a leading period '.' and a sequence of at least 1, and a maximum of 3 digits and store each of these digit groups (octets) in the containers \2, \3 and \4.
Code:
s~^\([0-9]\{1,3\}\)\.
s : substitute/replace
~ : our own chosen delimiter denoting start of the subst. pattern
: instead of the default "/"
^ : beginning of line (the shell will strip all leading whitespace )
\( : start storing in the first container \1
[0-9] : a character in the range 0-9, a digit
\{ : start of quantifier
1,3 : minimal one, maximal 3
\} : end of quantifier
\) : end of storing in container \1
\. : a literal period '.'
A "." is a so-called meta-character in regular expressions and stands for "any" character. Because we want to match a real literal '.' we have to escape it with a '\'.
The opposite is true for the quantifier indicators \{ and \} and the grouping indicators \( and \). Here the opening and closing brace and the '(', ')' have no special meaning in the regular expression language To upgrade them to their special meaning, of start and stop storing, they need a leading "\".
Code:
\([0-9]\{1,3\}\)\.
\( : start storing in next container \2
[0-9] : a character in the range 0-9
\{ : start of quantifier
1,3 : minimal one, maximal 3
\} : end of quantifier
\) : end of storing in container \2
\. : a literal '.' escaped with '\'
\([0-9]\{1,3\}\)\.
\( : start storing in next container \3
[0-9] : a character in the range 0-9
\{ : start of quantifier
1,3 : minimal one, maximal 3
\} : end of quantifier
\) : end of storing in container \3
\. : a literal '.' escaped with '\'
\([0-9]\{1,3\}\)$
\( : start storing in next container \4
[0-9] : a character in the range 0-9
\{ : start of quantifier
1,3 : minimal one, maximal 3
\} : end of quantifier
\) : end of storing in container \4
$ : end of string
We now have matched the 4 groups of digits of the IP address in four containers: '\1', '\2', '\3' and '\4'
We only stored the digit groups, not the separating periods or dots. Now we substitute and rearrange them in reverse order and re-insert the periods.
Note that in the search or matching pattern, the period "." is a special character, that needs to be escaped if we want to match a real period. That doesn't apply to the substitution or replacement. Here a period is just a plain normal period.
Code:
~\4.\3.\2.\1~p
~ : our custom delimiter, marking the end of the matching pattern, and start of
the substituting part
\4 : 4th digit group first
. : a '.' has no special meaning in replacement part, so not escaped with '\')
\3 : 3th digit group
. : a '.'
\2 : 2nd digit group
. : a '.'
\1 : 1st digit group
. : a '.'
p : print the matched and substituted pattern
If the regular expression did not match, this substitution will not be done and nothing will be printed to standard output. So nothing would be stored in the variable 'reverse'.
1.7 Alternative approach for this rather long regex
Both
'awk' and it's descendant
'perl' have an operator called 'split'. From the
'awk' man page:
Code:
split(s, a, fs) Splits the string s into array elements a[1], a[2], ...,
a[n] and returns n. The separation is done with the
regular expression fs or with the field separator FS if
fs is not given. An empty string as field separator
splits the string into one array element per character.
So you could split on the dots separating the 4 IP address octets and store the octets in 4 array elements. This is left as an exercise for the reader
Both
'sed' and
'awk' are part of the base installation of all Unix-like operating systems, while
'perl' for example is not part of FreeBSD base and needs to be installed separately. This fact made me use
'sed' instead of
'perl' for this particular script.