Discussion:
msrbl sigs: rsync
Steve Basford
2007-03-04 13:23:41 UTC
Permalink
Hi,

Just a heads up for those using the msrbl sigs.

As of last week:

"Downloading of the signature files is currently only available via rsync":

rsync rsync://rsync.mirror.msrbl.com/msrbl/MSRBL-SPAM.ndb /path/MSRBL-SPAM.ndb
rsync rsync://rsync.mirror.msrbl.com/msrbl/MSRBL-Images.hdb /path/MSRBL-Images.hdb

Looks like a few interesting improvements going on...

Cheers,

Steve

_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://lurker.clamav.net/list/clamav-users.html
Chris
2007-03-04 22:03:29 UTC
Permalink
Post by Steve Basford
Hi,
Just a heads up for those using the msrbl sigs.
rsync rsync://rsync.mirror.msrbl.com/msrbl/MSRBL-SPAM.ndb
/path/MSRBL-SPAM.ndb rsync
rsync://rsync.mirror.msrbl.com/msrbl/MSRBL-Images.hdb
/path/MSRBL-Images.hdb
Looks like a few interesting improvements going on...
Cheers,
Steve
Steve, since I'm using a script that was posted here quite some time ago what
changes need to be made:

curl -R -s -z MSRBL-SPAM.ndb -o $tmp_dir/MSRBL-SPAM.ndb \
http://download.mirror.msrbl.com/MSRBL-SPAM.ndb
test -s $tmp_dir/MSRBL-SPAM.ndb && \
clamscan --quiet -d $tmp_dir/MSRBL-SPAM.ndb && \
cp --reply=yes MSRBL-SPAM.ndb MSRBL-SPAM.ndb-bak && \
mv -f $tmp_dir/MSRBL-SPAM.ndb .

curl -R -s -z MSRBL-Images.hdb -o $tmp_dir/MSRBL-Images.hdb \
http://download.mirror.msrbl.com/MSRBL-Images.hdb
test -s $tmp_dir/MSRBL-Images.hdb && \
clamscan --quiet -d $tmp_dir/MSRBL-Images.hdb && \
cp --reply=yes MSRBL-Images.hdb MSRBL-Images.hdb-bak && \
mv -f $tmp_dir/MSRBL-Images.hdb .
--
Chris
KeyID 0xE372A7DA98E6705C
Dennis Peterson
2007-03-04 22:15:45 UTC
Permalink
Post by Chris
Post by Steve Basford
Hi,
Just a heads up for those using the msrbl sigs.
rsync rsync://rsync.mirror.msrbl.com/msrbl/MSRBL-SPAM.ndb
/path/MSRBL-SPAM.ndb rsync
rsync://rsync.mirror.msrbl.com/msrbl/MSRBL-Images.hdb
/path/MSRBL-Images.hdb
Looks like a few interesting improvements going on...
Cheers,
Steve
Steve, since I'm using a script that was posted here quite some time ago what
Create a text file, msrbl.list, with these two lines:
MSRBL-SPAM.ndb
MSRBL-Images.hdb

Run rsync and call that file, and the rsync URI from above:

rsync -aq --files-from=/path/to/msrbl.list \
rsync://rsync.mirror.msrbl.com/msrbl/ /path/to/pattern-files

This will download both the spam and image files in one invocation of
rsync, and put them in the directory pointed to by /path/to/pattern-files.

Run validation checks as before to be sure they won't break clamd. The
msrbl.list is to allow the thing to scale in the event msrbl adds more
lists to their fine services.

dp

_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://lurker.clamav.net/list/clamav-users.html
Chris
2007-03-04 23:32:05 UTC
Permalink
Post by Steve Basford
Post by Chris
Steve, since I'm using a script that was posted here quite some time ago
MSRBL-SPAM.ndb
MSRBL-Images.hdb
rsync -aq --files-from=/path/to/msrbl.list \
rsync://rsync.mirror.msrbl.com/msrbl/ /path/to/pattern-files
This will download both the spam and image files in one invocation of
rsync, and put them in the directory pointed to by /path/to/pattern-files.
Run validation checks as before to be sure they won't break clamd. The
msrbl.list is to allow the thing to scale in the event msrbl adds more
lists to their fine services.
dp
Hopefully I've done this right, a new MSRBL-images.hdb and MSRBL-spam.ndb were
downloaded to /var/tmp/clamav, appeared to be tested and moved
to /var/lib/clamav. Here is what I now have in the script:

rsync -aq --files-from=/usr/local/bin/msrbl.list \
rsync://rsync.mirror.msrbl.com/msrbl/ /var/tmp/clamdb
test -s $tmp_dir/MSRBL-SPAM.ndb && \
clamscan --quiet -d $tmp_dir/MSRBL-SPAM.ndb && \
cp --reply=yes MSRBL-SPAM.ndb MSRBL-SPAM.ndb-bak && \
mv -f $tmp_dir/MSRBL-SPAM.ndb .

test -s $tmp_dir/MSRBL-Images.hdb && \
clamscan --quiet -d $tmp_dir/MSRBL-Images.hdb && \
cp --reply=yes MSRBL-Images.hdb MSRBL-Images.hdb-bak && \
mv -f $tmp_dir/MSRBL-Images.hdb .

I ran it twice and both times it downloaded a new .hdb and .ndb file at least
the 'modified' times were within a couple of minutes of the current time.
I've commented out the

# curl -R -s -z MSRBL-SPAM.ndb -o $tmp_dir/MSRBL-SPAM.ndb \
# http://download.mirror.msrbl.com/MSRBL-SPAM.ndb

# curl -R -s -z MSRBL-Images.hdb -o $tmp_dir/MSRBL-Images.hdb \
# http://download.mirror.msrbl.com/MSRBL-Images.hdb
--
Chris
KeyID 0xE372A7DA98E6705C
Dennis Peterson
2007-03-04 23:40:40 UTC
Permalink
Post by Chris
Post by Steve Basford
Post by Chris
Steve, since I'm using a script that was posted here quite some time ago
MSRBL-SPAM.ndb
MSRBL-Images.hdb
rsync -aq --files-from=/path/to/msrbl.list \
rsync://rsync.mirror.msrbl.com/msrbl/ /path/to/pattern-files
This will download both the spam and image files in one invocation of
rsync, and put them in the directory pointed to by /path/to/pattern-files.
Run validation checks as before to be sure they won't break clamd. The
msrbl.list is to allow the thing to scale in the event msrbl adds more
lists to their fine services.
dp
Hopefully I've done this right, a new MSRBL-images.hdb and MSRBL-spam.ndb were
downloaded to /var/tmp/clamav, appeared to be tested and moved
rsync -aq --files-from=/usr/local/bin/msrbl.list \
rsync://rsync.mirror.msrbl.com/msrbl/ /var/tmp/clamdb
test -s $tmp_dir/MSRBL-SPAM.ndb && \
clamscan --quiet -d $tmp_dir/MSRBL-SPAM.ndb && \
cp --reply=yes MSRBL-SPAM.ndb MSRBL-SPAM.ndb-bak && \
mv -f $tmp_dir/MSRBL-SPAM.ndb .
test -s $tmp_dir/MSRBL-Images.hdb && \
clamscan --quiet -d $tmp_dir/MSRBL-Images.hdb && \
cp --reply=yes MSRBL-Images.hdb MSRBL-Images.hdb-bak && \
mv -f $tmp_dir/MSRBL-Images.hdb .
I ran it twice and both times it downloaded a new .hdb and .ndb file at least
the 'modified' times were within a couple of minutes of the current time.
I've commented out the
# curl -R -s -z MSRBL-SPAM.ndb -o $tmp_dir/MSRBL-SPAM.ndb \
# http://download.mirror.msrbl.com/MSRBL-SPAM.ndb
# curl -R -s -z MSRBL-Images.hdb -o $tmp_dir/MSRBL-Images.hdb \
# http://download.mirror.msrbl.com/MSRBL-Images.hdb
Looks like it worked as expected.

dp
_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://lurker.clamav.net/list/clamav-users.html
Dennis Peterson
2007-03-05 06:08:43 UTC
Permalink
Post by Chris
Post by Steve Basford
Post by Chris
Steve, since I'm using a script that was posted here quite some time ago
MSRBL-SPAM.ndb
MSRBL-Images.hdb
rsync -aq --files-from=/path/to/msrbl.list \
rsync://rsync.mirror.msrbl.com/msrbl/ /path/to/pattern-files
This will download both the spam and image files in one invocation of
rsync, and put them in the directory pointed to by /path/to/pattern-files.
Run validation checks as before to be sure they won't break clamd. The
msrbl.list is to allow the thing to scale in the event msrbl adds more
lists to their fine services.
dp
Hopefully I've done this right, a new MSRBL-images.hdb and MSRBL-spam.ndb were
downloaded to /var/tmp/clamav, appeared to be tested and moved
rsync -aq --files-from=/usr/local/bin/msrbl.list \
rsync://rsync.mirror.msrbl.com/msrbl/ /var/tmp/clamdb
test -s $tmp_dir/MSRBL-SPAM.ndb && \
clamscan --quiet -d $tmp_dir/MSRBL-SPAM.ndb && \
cp --reply=yes MSRBL-SPAM.ndb MSRBL-SPAM.ndb-bak && \
mv -f $tmp_dir/MSRBL-SPAM.ndb .
test -s $tmp_dir/MSRBL-Images.hdb && \
clamscan --quiet -d $tmp_dir/MSRBL-Images.hdb && \
cp --reply=yes MSRBL-Images.hdb MSRBL-Images.hdb-bak && \
mv -f $tmp_dir/MSRBL-Images.hdb .
I ran it twice and both times it downloaded a new .hdb and .ndb file at least
the 'modified' times were within a couple of minutes of the current time.
I've commented out the
I just now realized you're moving the downloaded file to the ClamAV
working directory rather than copying it. By doing this you defeat one
of the truly great things about rsync - intelligent copies. For small
files this isn't a big deal but for for very large files rsync has to
download the entire thing even though it may have only changed in the
last few lines. I'll give you an example - stop me if you've heard this...

A web server log can grow by several megs each day. At the end of each
day you'd like to have a copy of that log, now nearly 20g in size, sent
to your activity reporter. Rather than copying that 20g+ and growing
file, you use rsync. Rsync will look at the remote file and compare it
with your local file and send only the differences - that would be the
changes made that day.

In order to do this it has to have a previous copy to compare against
which is why moving your file as you do negates this feature.

The other thing rsync does is some quick math to see if the source file
is changed from the local copy, and if there's been no changes the
process stops. Very efficient.

The rsync process allows you to optimize for cpu usage or bandwidth when
the source file has changed. It takes some cpu power to make the file
comparisons but very little bandwidth. If you optimize for cpu usage
then bandwidth suffers as you have to transfer entire files. Rsync does
a poor job for text files that are compressed at each update because the
entire zip file is different even when small changes are made so it ends
up having to transfer nearly all of the file each time even though the
unzip text may have changed very little.

I have a remote and very isolated server. It's a 1U Sun sparc with no
tape drive, no expansion slots, no nothing, and I need to back it up to
my NAS across the state. I use rsync to copy only the changed parts of
existing files, or entire files if they are new, to my NAS where they
are then put on tape. That's a very busy little server with dozens of
web servers, mail lists, user accounts, etc., and it takes very little
time and bandwidth to refresh the local data thanks to rsync.

My guess is the MSRBL folks would like it if you downloaded the new
files only if the file has been modified.
_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://lurker.clamav.net/list/clamav-users.html
Steve Basford
2007-03-05 06:46:49 UTC
Permalink
Post by Dennis Peterson
My guess is the MSRBL folks would like it if you downloaded the new
files only if the file has been modified.
I think you're right... the size of their images .ndb file
(un-compressed) jumped to about 7.5 meg in size and I guess shifting
that amount of data for x users, would slowly become more of a pain,
especially as they seem to be adding md5 hashes at a growing rate, so
with rsync used correctly, it'll only be shifting a small amount of data.

Thanks for the script help too... if anyone would like to modify the
current scripts on my site and come up with a rsync version for the
msrbl sigs only... then I'll certainly update them on my site.

Thanks all!

Steve

_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://lurker.clamav.net/list/clamav-users.html
Bill Landry
2007-03-06 07:39:58 UTC
Permalink
Post by Steve Basford
Post by Dennis Peterson
My guess is the MSRBL folks would like it if you downloaded the new
files only if the file has been modified.
I think you're right... the size of their images .ndb file
(un-compressed) jumped to about 7.5 meg in size and I guess shifting
that amount of data for x users, would slowly become more of a pain,
especially as they seem to be adding md5 hashes at a growing rate, so
with rsync used correctly, it'll only be shifting a small amount of data.
Thanks for the script help too... if anyone would like to modify the
current scripts on my site and come up with a rsync version for the
msrbl sigs only... then I'll certainly update them on my site.
Thanks all!
Steve
Steve, below is an update of my script using rsync for the MSRBL files:
==========
#!/bin/bash

# Either set and export PATH
PATH=/bin:/sbin:/usr/bin:/usr/local/bin:/usr/local/sbin
export PATH

# or set individual program paths
#clamd="/usr/local/sbin/clamd"
#clamscan="/usr/local/bin/clamscan"
#curl="/usr/local/bin/curl"
#gunzip="/bin/gunzip"
#service="/sbin/service"
#test="/usr/bin/test"

# Set working directory paths
tmp_dir="/var/tmp/clamdb"
rsync_dir="/var/tmp/rsync"

# Change shell to ClamAV database directory
cd /var/lib/clamav

# Check for SaneSecurity SCAM database update
curl -R -s -S -z scam.ndb.gz -o $tmp_dir/scam.ndb.gz \
http://www.sanesecurity.com/clamav/scam.ndb.gz
test -s $tmp_dir/scam.ndb.gz && \
gunzip -cdf $tmp_dir/scam.ndb.gz > $tmp_dir/scam.ndb && \
mv -f $tmp_dir/scam.ndb.gz . && \
clamscan --quiet -d $tmp_dir/scam.ndb - < /dev/null && \
cp --reply=yes scam.ndb scam.ndb-bak && \
mv -f $tmp_dir/scam.ndb .

# Check for SaneSecurity PHISH database update
curl -R -s -S -z phish.ndb.gz -o $tmp_dir/phish.ndb.gz \
http://www.sanesecurity.com/clamav/phish.ndb.gz
test -s $tmp_dir/phish.ndb.gz && \
gunzip -cdf $tmp_dir/phish.ndb.gz > $tmp_dir/phish.ndb && \
mv -f $tmp_dir/phish.ndb.gz . && \
clamscan --quiet -d $tmp_dir/phish.ndb - < /dev/null && \
cp --reply=yes phish.ndb phish.ndb-bak && \
mv -f $tmp_dir/phish.ndb .

# Check for MSRBL SPAM database update
rsync -a rsync://rsync.mirror.msrbl.com/msrbl/MSRBL-SPAM.ndb \
$rsync_dir/MSRBL-SPAM.ndb
cp $rsync_dir/MSRBL-SPAM.ndb $tmp_dir
test -s $tmp_dir/MSRBL-SPAM.ndb && \
clamscan --quiet -d $tmp_dir/MSRBL-SPAM.ndb - < /dev/null && \
cp --reply=yes MSRBL-SPAM.ndb MSRBL-SPAM.ndb-bak && \
mv -f $tmp_dir/MSRBL-SPAM.ndb .

# Check for MSRBL IMAGE database update
rsync -a rsync://rsync.mirror.msrbl.com/msrbl/MSRBL-Images.hdb \
$rsync_dir/MSRBL-Images.ndb
cp $rsync_dir/MSRBL-Images.ndb $tmp_dir
test -s $tmp_dir/MSRBL-Images.hdb && \
clamscan --quiet -d $tmp_dir/MSRBL-Images.hdb - < /dev/null && \
cp --reply=yes MSRBL-Images.hdb MSRBL-Images.hdb-bak && \
mv -f $tmp_dir/MSRBL-Images.hdb .

# Set appropriate file permission (should be whatever user account
# ClamD is running under)
chown -R clamav:clamav /var/lib/clamav

# Remove any leftover files in the working directory (should only
# happen when a corrupted database is detected)
rm -f /var/tmp/clamdb/*

# Reload database (should not be necessary if you have "SelfCheck"
# enabled in clamd.conf and/or "NotifyClamd" enabled in freshclam.conf)
#service clamd reload
==========

This script sets one additional working directory for storing rsync
files before copying them to the temp directory for testing and
processing. Feel free to update the current script on your scripts
download site.

Bill
_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://lurker.clamav.net/list/clamav-users.html
Dennis Davis
2007-03-06 14:14:34 UTC
Permalink
Date: Mon, 05 Mar 2007 23:39:58 -0800
Subject: Re: [Clamav-users] msrbl sigs: rsync
...
# Check for MSRBL IMAGE database update
rsync -a rsync://rsync.mirror.msrbl.com/msrbl/MSRBL-Images.hdb \
$rsync_dir/MSRBL-Images.ndb
cp $rsync_dir/MSRBL-Images.ndb $tmp_dir
Shouldn't that read "MSRBL-Images.hdb" in the last two lines above?
--
Dennis Davis, BUCS, University of Bath, Bath, BA2 7AY, UK
***@bath.ac.uk Phone: +44 1225 386101
_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://lurker.clamav.net/list/clamav-users.html
Bill Landry
2007-03-06 16:05:59 UTC
Permalink
Post by Dennis Davis
Date: Mon, 05 Mar 2007 23:39:58 -0800
Subject: Re: [Clamav-users] msrbl sigs: rsync
...
# Check for MSRBL IMAGE database update
rsync -a rsync://rsync.mirror.msrbl.com/msrbl/MSRBL-Images.hdb \
$rsync_dir/MSRBL-Images.ndb
cp $rsync_dir/MSRBL-Images.ndb $tmp_dir
Shouldn't that read "MSRBL-Images.hdb" in the last two lines above?
Yes, you are correct, thanks for catching that (damn keyboard viruses!) ;-)

Bill
_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://lurker.clamav.net/list/clamav-users.html
Bill Landry
2007-03-06 18:48:04 UTC
Permalink
Post by Bill Landry
Post by Dennis Davis
Date: Mon, 05 Mar 2007 23:39:58 -0800
Subject: Re: [Clamav-users] msrbl sigs: rsync
...
# Check for MSRBL IMAGE database update
rsync -a rsync://rsync.mirror.msrbl.com/msrbl/MSRBL-Images.hdb \
$rsync_dir/MSRBL-Images.ndb
cp $rsync_dir/MSRBL-Images.ndb $tmp_dir
Shouldn't that read "MSRBL-Images.hdb" in the last two lines above?
Yes, you are correct, thanks for catching that (damn keyboard
viruses!) ;-)
Bill
Here is my latest script iteration, which now includes testing for newer
files before copying the file to the temp working directory for testing,
and when copying is done due to a newer file being found, the original
timestamps will be now preserved on the copied files.
==========
#!/bin/bash

# Either set and export PATH
PATH=/bin:/sbin:/usr/bin:/usr/local/bin:/usr/local/sbin
export PATH

# or set individual program paths
#clamd="/usr/local/sbin/clamd"
#clamscan="/usr/local/bin/clamscan"
#curl="/usr/local/bin/curl"
#gunzip="/bin/gunzip"
#service="/sbin/service"
#test="/usr/bin/test"

# Set working directory paths
tmp_dir="/var/tmp/clamdb"
rsync_dir="/var/tmp/rsync"

# Change shell to ClamAV database directory
cd /var/lib/clamav

# Check for SaneSecurity SCAM database update
curl -R -s -S -z scam.ndb.gz -o $tmp_dir/scam.ndb.gz \
http://www.sanesecurity.com/clamav/scam.ndb.gz
test -s $tmp_dir/scam.ndb.gz && \
gunzip -cdf $tmp_dir/scam.ndb.gz > $tmp_dir/scam.ndb && \
mv -f $tmp_dir/scam.ndb.gz . && \
clamscan --quiet -d $tmp_dir/scam.ndb - < /dev/null && \
cp --reply=yes scam.ndb scam.ndb-bak && \
mv -f $tmp_dir/scam.ndb .

# Check for SaneSecurity PHISH database update
curl -R -s -S -z phish.ndb.gz -o $tmp_dir/phish.ndb.gz \
http://www.sanesecurity.com/clamav/phish.ndb.gz
test -s $tmp_dir/phish.ndb.gz && \
gunzip -cdf $tmp_dir/phish.ndb.gz > $tmp_dir/phish.ndb && \
mv -f $tmp_dir/phish.ndb.gz . && \
clamscan --quiet -d $tmp_dir/phish.ndb - < /dev/null && \
cp --reply=yes phish.ndb phish.ndb-bak && \
mv -f $tmp_dir/phish.ndb .

# Check for MSRBL SPAM database update
rsync -a rsync://rsync.mirror.msrbl.com/msrbl/MSRBL-SPAM.ndb \
$rsync_dir/MSRBL-SPAM.ndb
test $rsync_dir/MSRBL-SPAM.ndb -nt MSRBL-SPAM.ndb && \
cp -p $rsync_dir/MSRBL-SPAM.ndb $tmp_dir && \
test -s $tmp_dir/MSRBL-SPAM.ndb && \
clamscan --quiet -d $tmp_dir/MSRBL-SPAM.ndb - < /dev/null && \
cp --reply=yes MSRBL-SPAM.ndb MSRBL-SPAM.ndb-bak && \
mv -u $tmp_dir/MSRBL-SPAM.ndb .

# Check for MSRBL IMAGE database update
rsync -a rsync://rsync.mirror.msrbl.com/msrbl/MSRBL-Images.hdb \
$rsync_dir/MSRBL-Images.hdb
test $rsync_dir/MSRBL-Images.hdb -nt MSRBL-Images.hdb && \
cp -p $rsync_dir/MSRBL-Images.hdb $tmp_dir && \
test -s $tmp_dir/MSRBL-Images.hdb && \
clamscan --quiet -d $tmp_dir/MSRBL-Images.hdb - < /dev/null && \
cp --reply=yes MSRBL-Images.hdb MSRBL-Images.hdb-bak && \
mv -u $tmp_dir/MSRBL-Images.hdb .

# Set appropriate file permission (should be whatever user account
# ClamD is running under)
chown -R clamav:clamav /var/lib/clamav

# Remove any leftover files in the $tmp_dir working directory
# (should only happen when a corrupted database is detected)
rm -f /var/tmp/clamdb/*

# Reload databases (should not be necessary if you have "SelfCheck"
# enabled in clamd.conf and/or "NotifyClamd" enabled in freshclam.conf)
#service clamd reload
==========

Bill
_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://lurker.clamav.net/list/clamav-users.html
Dennis Peterson
2007-03-06 19:18:30 UTC
Permalink
Post by Bill Landry
Post by Bill Landry
Post by Dennis Davis
Date: Mon, 05 Mar 2007 23:39:58 -0800
Subject: Re: [Clamav-users] msrbl sigs: rsync
...
# Check for MSRBL IMAGE database update
rsync -a rsync://rsync.mirror.msrbl.com/msrbl/MSRBL-Images.hdb \
$rsync_dir/MSRBL-Images.ndb
cp $rsync_dir/MSRBL-Images.ndb $tmp_dir
Shouldn't that read "MSRBL-Images.hdb" in the last two lines above?
Yes, you are correct, thanks for catching that (damn keyboard
viruses!) ;-)
Bill
Here is my latest script iteration, which now includes testing for newer
files before copying the file to the temp working directory for testing,
and when copying is done due to a newer file being found, the original
timestamps will be now preserved on the copied files.
I took just a quick look but it appears you are doing a time comparison
to a moved file, not the original file. Also - with just a teeny bit of
work you can reduce this to a single each curl and rsync invocation
rather than two each.

here's my version not that it's any great shakes, but it does work.
There's some installation steps to using it. You have to create the tmp
directory somewhere or use the example, and you have to create some
empty time tagging files (it uses a make-like paradigm). They are:
newspam, newphish, newscam, and newimages. These go in the tmp
directory. There are two test files that list the files to download. The
file names and contents are:

file.list
http://www.sanesecurity.com/clamav/phish.ndb.gz
http://www.sanesecurity.com/clamav/scam.ndb.gz

msrbl.list
MSRBL-Images.hdb
MSRBL-SPAM.ndb

Some of the complexity is a consequence of wget return codes being less
than helpful. It is the same whether a file is fetched or not.

Neither wget nor rsync will download a file unless the source is newer
than the local file. The post-fetch processing won't run unless the
local pattern files are newer than the time tagging files. It tries to
not waste time and cpu. The downloaded files are not modified in any way
so they retain their times and sizes. Rsync is used to put the
downloaded files into the working directory and this is an atomic
process so clamd doesn't barf. wget and rsync run once to get all four
files (or more if the vendors add to their list).

There is a 900 second randomizer so that this can run from cron but be a
bit agnostic of the cron cycle. The intent is to prevent my systems from
piling on the remote servers at regular intervals. Folks forget that
there are 60 minutes in the hour to set cron to run but so very many set
things to fire at 00 minutes. To get an immediate update enter any
string as an argument. If $1 is not empty the process will skip the
randomizer. There is a safety valve built in that prevents multiple
copies of this script from running. If an earlier instance is discovered
the new invocation will kill it and die. The cron cycle is such that it
should be only a broken instance that would be found still running and
this tries to clean things up.

And it is written for Solaris.


--------- 8< cut here ------------
#!/bin/bash

# usage: sanesecurity.sh [now]
# Arg "now" overrides random delay

RunFlag="/var/tmp/sane"
WorkingDirectory="/usr/local/share/clamav/tmp"
FileList="/usr/local/share/clamav/tmp/file.list"
MsrblServer="rsync://rsync.mirror.msrbl.com/msrbl/"
MsrblList="/usr/local/share/clamav/tmp/msrbl.list"

if [ -f "$RunFlag" ]; then
echo "This script already running. Cleaning up..."
/usr/bin/rm $RunFlag
/usr/bin/pkill sanesecurity.sh
else
/usr/bin/touch $RunFlag
fi

# sleep random 900 seconds to prevent cron lockstep
# with other clients. Use any command line arg to force
# immediate update. ARG[1] is arbitrary string.
if [ -z "$1" ]; then
sleep $[ RANDOM % 900 ]
fi

cd $WorkingDirectory

# Get Sane Security
/usr/local/bin/wget -q -N --input-file=$FileList >/dev/null 2>&1

# Process gzip files From SaneSecurity
if /usr/bin/test phish.ndb.gz -nt newphish; then
/usr/bin/gunzip < phish.ndb.gz > phish.ndb
/usr/local/bin/clamscan --quiet -d phish.ndb clam.txt && \
/usr/local/bin/rsync phish.ndb /usr/local/share/clamav || \
echo "phish.ndb is corrupt"
/usr/bin/settime -f phish.ndb.gz newphish
fi

if /usr/bin/test scam.ndb.gz -nt newscam; then
/usr/bin/gunzip < scam.ndb.gz > scam.ndb
/usr/local/bin/clamscan --quiet -d scam.ndb clam.txt && \
/usr/local/bin/rsync scam.ndb /usr/local/share/clamav || \
echo "scam.ndb is corrupt"
/usr/bin/settime -f scam.ndb.gz newscam
fi

# Get MSRBL files
/usr/local/bin/rsync -a --quiet --files-from=$MsrblList $MsrblServer
$WorkingDirectory >/dev/null 2>&1

# Processess text files from MSRBL
if /usr/bin/test MSRBL-Images.hdb -nt newimages; then
/usr/local/bin/clamscan --quiet -d MSRBL-Images.hdb clam.txt && \
/usr/local/bin/rsync MSRBL-Images.hdb /usr/local/share/clamav || \
echo "MSRBL-Images.hdb is corrupt"
/usr/bin/settime -f MSRBL-Images.hdb newimages
fi

if /usr/bin/test MSRBL-SPAM.ndb -nt newspam; then
/usr/local/bin/clamscan --quiet -d MSRBL-SPAM.ndb clam.txt && \
/usr/local/bin/rsync MSRBL-SPAM.ndb /usr/local/share/clamav || \
echo "MSRBL-SPAM.ndb is corrupt"
/usr/bin/settime -f MSRBL-SPAM.ndb newspam
fi

# clear run flag
/usr/bin/rm $RunFlag >/dev/null 2>&1

------------ >8 cut here -------------

dp
_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://lurker.clamav.net/list/clamav-users.html
Dennis Peterson
2007-03-06 19:44:38 UTC
Permalink
Post by Dennis Peterson
There are two test files that list the files to download. The
file.list
http://www.sanesecurity.com/clamav/phish.ndb.gz
http://www.sanesecurity.com/clamav/scam.ndb.gz
msrbl.list
MSRBL-Images.hdb
MSRBL-SPAM.ndb
Just remembered one other file - a short text file (clam.txt) that gives
clamscan something brief to do while it is testing the patterns.

dp
_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://lurker.clamav.net/list/clamav-users.html
Ian Abbott
2007-03-09 11:03:39 UTC
Permalink
Post by Dennis Peterson
Just remembered one other file - a short text file (clam.txt) that gives
clamscan something brief to do while it is testing the patterns.
Some scripts scan "-" for standard input and redirect it from /dev/null,
so you'd have something like:

/usr/local/bin/clamscan --quiet -d MSRBL-Images.hdb - < /dev/null
--
-=( Ian Abbott @ MEV Ltd. E-mail: <***@mev.co.uk> )=-
-=( Tel: +44 (0)161 477 1898 FAX: +44 (0)161 718 3587 )=-
_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://lurker.clamav.net/list/clamav-users.html
Bill Landry
2007-03-06 20:59:54 UTC
Permalink
Post by Dennis Peterson
Post by Bill Landry
Here is my latest script iteration, which now includes testing for
newer files before copying the file to the temp working directory
for testing, and when copying is done due to a newer file being
found, the original timestamps will be now preserved on the copied
files.
I took just a quick look but it appears you are doing a time
comparison to a moved file, not the original file. Also - with just a
teeny bit of work you can reduce this to a single each curl and rsync
invocation rather than two each.
Moved files retain their original date/time stamps. For the rsync
files, I am comparing to the original files that are held in the
/var/tmp/rsync directory:

rsync -a rsync://rsync.mirror.msrbl.com/msrbl/MSRBL-Images.hdb \
$rsync_dir/MSRBL-Images.hdb
test $rsync_dir/MSRBL-Images.hdb -nt MSRBL-Images.hdb && \

I'll take a look at your code, however, the whole reason for downloading
and testing the files one at a time was due to the fact that you cannot
scan an individual file in a directory without scanning all files in the
directory, as well as all files in the directory that the command is
executed from. This single file implementation seemed to overcome this
issue.

Bill
_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://lurker.clamav.net/list/clamav-users.html
Dennis Peterson
2007-03-06 22:16:56 UTC
Permalink
Post by Bill Landry
Post by Dennis Peterson
Post by Bill Landry
Here is my latest script iteration, which now includes testing for
newer files before copying the file to the temp working directory
for testing, and when copying is done due to a newer file being
found, the original timestamps will be now preserved on the copied
files.
I took just a quick look but it appears you are doing a time
comparison to a moved file, not the original file. Also - with just a
teeny bit of work you can reduce this to a single each curl and rsync
invocation rather than two each.
Moved files retain their original date/time stamps. For the rsync
files, I am comparing to the original files that are held in the
rsync -a rsync://rsync.mirror.msrbl.com/msrbl/MSRBL-Images.hdb \
$rsync_dir/MSRBL-Images.hdb
test $rsync_dir/MSRBL-Images.hdb -nt MSRBL-Images.hdb && \
I'll take a look at your code, however, the whole reason for downloading
and testing the files one at a time was due to the fact that you cannot
scan an individual file in a directory without scanning all files in the
directory, as well as all files in the directory that the command is
executed from. This single file implementation seemed to overcome this
issue.
When I give clamscan the filename clam.txt to scan that is the only file
it scans and that's also the only reason that file exists. It is
suprising that on my 400 mhz sparc Netra server that it takes 22 seconds
to scan a single file. In fact it takes that long to discover a file is
empty. That seems a bit odd but there must be a reason that test is done
at the end and not the beginning of the db load.

dp
_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://lurker.clamav.net/list/clamav-users.html
Dennis Davis
2007-03-07 11:25:57 UTC
Permalink
Date: Tue, 06 Mar 2007 11:18:30 -0800
Subject: Re: [Clamav-users] msrbl sigs: rsync
...
if [ -f "$RunFlag" ]; then
echo "This script already running. Cleaning up..."
/usr/bin/rm $RunFlag
/usr/bin/pkill sanesecurity.sh
else
/usr/bin/touch $RunFlag
fi
...
This is getting a bit off-topic, but here goes anyway...

I believe there are potential race problems in the above method
of locking. Not necessarily in this case where the script is run
infrequently. But in general when you've no idea when a particular
script may run.

Appended below is the shell script fragment I use in scripts that
require locking. You'll need to adjust command definitions etc to
taste.

This fragment uses the fact that hard-linking is atomic on (all?)
Unix systems. So it creates a temporary file with a unique name
and attempts to hard-link the lockname to it. If the hard-linking
succeeds, you've definitely got the lock. If not, it's already in
use. As window dressing, the script will try five times to get the
lock and waits for a minute between retries.

I'm not going to claim originality for this fragment. I'm sure I
developed it from a similar fragment I saw in the Usenet C News
scripts[1].

Here's the fragment:


sleeptime=60 # one minute.
max=5 # max no of attempts to get a lock.

echo=/bin/echo
ln=/bin/ln
rm=/bin/rm
sleep=/bin/sleep

...

cd {some directory} || exit 1

# Avoid straying into the gunsights of some other brave soldier...
pid=$$
lockname=LOCK_FETCH
lock_temp=$lockname.$pid
trap "$rm -f $lock_temp; trap 0" 0 1 2 15
$echo $pid > $lock_temp || exit 1

tries=
while :
do
if $ln $lock_temp $lockname 2> /dev/null
then
$rm -f $lock_temp
trap "$rm -f $lockname; trap 0" 0 1 2 15
break
fi
tries=$(($tries + 1))
[ $tries -gt $max ] && exit 1
$sleep $sleeptime
done


[1] "Managing Usenet", Henry Spencer & David Lawrence, O'Reilley &
Associates Inc, 1998, ISBN 1-56592-198-4
--
Dennis Davis, BUCS, University of Bath, Bath, BA2 7AY, UK
***@bath.ac.uk Phone: +44 1225 386101
_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://lurker.clamav.net/list/clamav-users.html
Dennis Peterson
2007-03-07 14:49:17 UTC
Permalink
Post by Dennis Davis
Date: Tue, 06 Mar 2007 11:18:30 -0800
Subject: Re: [Clamav-users] msrbl sigs: rsync
...
if [ -f "$RunFlag" ]; then
echo "This script already running. Cleaning up..."
/usr/bin/rm $RunFlag
/usr/bin/pkill sanesecurity.sh
else
/usr/bin/touch $RunFlag
fi
...
This is getting a bit off-topic, but here goes anyway...
I believe there are potential race problems in the above method
of locking. Not necessarily in this case where the script is run
infrequently. But in general when you've no idea when a particular
script may run.
It is very simplistic but the application allows it and cron is very
much involved. This would not work if cron were capable of creating
overlapping instances. The intention was to prevent blocking by a broken
process and there's lots of room for more elegance. For example, this
assassin script of mine won't run again until cron fires it off hours
later. Better would be to kill off previous instances and then attempt
to complete the mission. That is best done with serialized pid filenames
so the new instance doesn't have to kill itself as well as it's
predecessors.

Your alternative on first glance seems to give up without destroying the
previous instance, so it will never allow the rest of the script to run
to completion. That is safe from a process table view but doesn't help
refresh the databases.

dp
_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://lurker.clamav.net/list/clamav-users.html
Chris
2007-03-05 12:06:47 UTC
Permalink
Post by Dennis Peterson
Post by Chris
I ran it twice and both times it downloaded a new .hdb and .ndb file at
least the 'modified' times were within a couple of minutes of the current
time. I've commented out the
I just now realized you're moving the downloaded file to the ClamAV
working directory rather than copying it. By doing this you defeat one
of the truly great things about rsync - intelligent copies. For small
files this isn't a big deal but for for very large files rsync has to
download the entire thing even though it may have only changed in the
last few lines. I'll give you an example - stop me if you've heard this...
Dennis, being someone that is greatly lacking in the area of scripting, how
should the script read to download just the changes?

# =========================================================
# Check for new DB files. If new, download, test & process
# =========================================================
curl -R -s -z scam.ndb.gz -o $tmp_dir/scam.ndb.gz \
http://www.sanesecurity.com/clamav/scam.ndb.gz
test -s $tmp_dir/scam.ndb.gz && \
gunzip -cdf $tmp_dir/scam.ndb.gz > $tmp_dir/scam.ndb && \
mv -f $tmp_dir/scam.ndb.gz . && \
clamscan --quiet -d $tmp_dir/scam.ndb && \
cp --reply=yes scam.ndb scam.ndb-bak && \
mv -f $tmp_dir/scam.ndb .

curl -R -s -z phish.ndb.gz -o $tmp_dir/phish.ndb.gz \
http://www.sanesecurity.com/clamav/phish.ndb.gz
test -s $tmp_dir/phish.ndb.gz && \
gunzip -cdf $tmp_dir/phish.ndb.gz > $tmp_dir/phish.ndb && \
mv -f $tmp_dir/phish.ndb.gz . && \
clamscan --quiet -d $tmp_dir/phish.ndb && \
cp --reply=yes phish.ndb phish.ndb-bak && \
mv -f $tmp_dir/phish.ndb .

rsync -aq --files-from=/usr/local/bin/msrbl.list \
rsync://rsync.mirror.msrbl.com/msrbl/ /var/tmp/clamdb
# curl -R -s -z MSRBL-SPAM.ndb -o $tmp_dir/MSRBL-SPAM.ndb \
# http://download.mirror.msrbl.com/MSRBL-SPAM.ndb
test -s $tmp_dir/MSRBL-SPAM.ndb && \
clamscan --quiet -d $tmp_dir/MSRBL-SPAM.ndb && \
cp --reply=yes MSRBL-SPAM.ndb MSRBL-SPAM.ndb-bak && \
mv -f $tmp_dir/MSRBL-SPAM.ndb .

# curl -R -s -z MSRBL-Images.hdb -o $tmp_dir/MSRBL-Images.hdb \
# http://download.mirror.msrbl.com/MSRBL-Images.hdb
test -s $tmp_dir/MSRBL-Images.hdb && \
clamscan --quiet -d $tmp_dir/MSRBL-Images.hdb && \
cp --reply=yes MSRBL-Images.hdb MSRBL-Images.hdb-bak && \
mv -f $tmp_dir/MSRBL-Images.hdb .

I've included the whole download portion of the script including Steves.
Thanks for the assistance.
--
Chris
KeyID 0xE372A7DA98E6705C
Dennis Peterson
2007-03-05 15:35:51 UTC
Permalink
Post by Chris
Post by Dennis Peterson
Post by Chris
I ran it twice and both times it downloaded a new .hdb and .ndb file at
least the 'modified' times were within a couple of minutes of the current
time. I've commented out the
I just now realized you're moving the downloaded file to the ClamAV
working directory rather than copying it. By doing this you defeat one
of the truly great things about rsync - intelligent copies. For small
files this isn't a big deal but for for very large files rsync has to
download the entire thing even though it may have only changed in the
last few lines. I'll give you an example - stop me if you've heard this...
Dennis, being someone that is greatly lacking in the area of scripting, how
should the script read to download just the changes?
# =========================================================
# Check for new DB files. If new, download, test & process
# =========================================================
curl -R -s -z scam.ndb.gz -o $tmp_dir/scam.ndb.gz \
http://www.sanesecurity.com/clamav/scam.ndb.gz
test -s $tmp_dir/scam.ndb.gz && \
gunzip -cdf $tmp_dir/scam.ndb.gz > $tmp_dir/scam.ndb && \
mv -f $tmp_dir/scam.ndb.gz . && \
clamscan --quiet -d $tmp_dir/scam.ndb && \
cp --reply=yes scam.ndb scam.ndb-bak && \
mv -f $tmp_dir/scam.ndb .
curl -R -s -z phish.ndb.gz -o $tmp_dir/phish.ndb.gz \
http://www.sanesecurity.com/clamav/phish.ndb.gz
test -s $tmp_dir/phish.ndb.gz && \
gunzip -cdf $tmp_dir/phish.ndb.gz > $tmp_dir/phish.ndb && \
mv -f $tmp_dir/phish.ndb.gz . && \
clamscan --quiet -d $tmp_dir/phish.ndb && \
cp --reply=yes phish.ndb phish.ndb-bak && \
mv -f $tmp_dir/phish.ndb .
rsync -aq --files-from=/usr/local/bin/msrbl.list \
rsync://rsync.mirror.msrbl.com/msrbl/ /var/tmp/clamdb
# curl -R -s -z MSRBL-SPAM.ndb -o $tmp_dir/MSRBL-SPAM.ndb \
# http://download.mirror.msrbl.com/MSRBL-SPAM.ndb
test -s $tmp_dir/MSRBL-SPAM.ndb && \
clamscan --quiet -d $tmp_dir/MSRBL-SPAM.ndb && \
cp --reply=yes MSRBL-SPAM.ndb MSRBL-SPAM.ndb-bak && \
mv -f $tmp_dir/MSRBL-SPAM.ndb .
# curl -R -s -z MSRBL-Images.hdb -o $tmp_dir/MSRBL-Images.hdb \
# http://download.mirror.msrbl.com/MSRBL-Images.hdb
test -s $tmp_dir/MSRBL-Images.hdb && \
clamscan --quiet -d $tmp_dir/MSRBL-Images.hdb && \
cp --reply=yes MSRBL-Images.hdb MSRBL-Images.hdb-bak && \
mv -f $tmp_dir/MSRBL-Images.hdb .
I've included the whole download portion of the script including Steves.
Thanks for the assistance.
The mv -f ... statement should be a cp ... statement. That will leave
the msrbl files in the directory that rsync uses for downloading and for
comparing versions.

FWIW, you should probably do the same with Steve's gz file as well. By
leaving it in the download directory, curl can determine if the file has
changed or not, and if not it won't bother downloading an identical
copy. You're already using the -c argument to gunzip which says don't
modify the zip file when extracting. Same suggestion applies for folks
who use wget - these tools are smart enough to not waste bandwidth if
the previous copy is left in place to test.

If you use wget rather than curl you can grab both of Steve's files in
one connection rather than two. I'll submit my script to Steve when I
get caught up on things here. It pulls down Sanesecurity and MSRBL files.

dp
_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://lurker.clamav.net/list/clamav-users.html
Dennis Peterson
2007-03-05 15:46:53 UTC
Permalink
Post by Dennis Peterson
If you use wget rather than curl you can grab both of Steve's files in
one connection rather than two. I'll submit my script to Steve when I
get caught up on things here. It pulls down Sanesecurity and MSRBL files.
I just recalled that curl allows this too with multiple --url statements
and multiple -O or -o statements. Never underestimate the power of curl :)

dp
_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://lurker.clamav.net/list/clamav-users.html
Noel Jones
2007-03-05 16:51:42 UTC
Permalink
Post by Dennis Peterson
The mv -f ... statement should be a cp ... statement. That will
leave the msrbl files in the directory that rsync uses for
downloading and for comparing versions.
It makes a great deal of sense to move the files into the clam DB
directory to insure an atomic operation. If clamd/clamav-milter
should happen to reload with a half-copied file in the DB dir, it
will likely stop running.

The solution is to copy the updated file to a temporary name leaving
the original intact for the next update run, then move the copy into
the clam DB directory.

rsync can do atomic updates in place, but it's probably wiser to do
this in a temp directory so you can test the signatures with
"clamscan -d file" to make sure they at least won't crash clamd.

pseudo-code something like:
cd /some/work/dir &&
rsync or curl newfile.db &&
clamscan -d newfile.db &&
cp newfile.db newfile.db.tmp &&
mv newfile.db.tmp /var/db/clamav/newfile.db
--
Noel Jones

_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://lurker.clamav.net/list/clamav-users.html
Dennis Peterson
2007-03-05 17:01:38 UTC
Permalink
Post by Noel Jones
Post by Dennis Peterson
The mv -f ... statement should be a cp ... statement. That will leave
the msrbl files in the directory that rsync uses for downloading and
for comparing versions.
It makes a great deal of sense to move the files into the clam DB
directory to insure an atomic operation. If clamd/clamav-milter should
happen to reload with a half-copied file in the DB dir, it will likely
stop running.
Yah - I realized that after reviewing the suggestion. Too much focus on
just one element of the entire problem.
Post by Noel Jones
The solution is to copy the updated file to a temporary name leaving the
original intact for the next update run, then move the copy into the
clam DB directory.
rsync can do atomic updates in place, but it's probably wiser to do this
in a temp directory so you can test the signatures with "clamscan -d
file" to make sure they at least won't crash clamd.
cd /some/work/dir &&
rsync or curl newfile.db &&
clamscan -d newfile.db &&
cp newfile.db newfile.db.tmp &&
mv newfile.db.tmp /var/db/clamav/newfile.db
This is correct and your method is a good solution - I use rsync for
migrating the staged file into the working directory and that is part of
the script I'll be sharing with Steve. As always there's more than one
way to do something, and it gets complex quickly.

dp
_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://lurker.clamav.net/list/clamav-users.html
Christopher X. Candreva
2007-03-05 17:09:06 UTC
Permalink
It makes a great deal of sense to move the files into the clam DB directory
to insure an atomic operation. If clamd/clamav-milter should happen to
reload with a half-copied file in the DB dir, it will likely stop running.
Yah - I realized that after reviewing the suggestion. Too much focus on just
one element of the entire problem.
You can also use rsync to copy the file(s) from the download location to the
clam directory on the same server. I believe rsync will make a temp file
then mv it into place, plus it's an easy way to only update the files that
changed. ie:

rsync -av ./phish.ndb ./scam.ndb ./MSRBL-* /usr/local/share/clamav


==========================================================
Chris Candreva -- ***@westnet.com -- (914) 948-3162
WestNet Internet Services of Westchester
http://www.westnet.com/
_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://lurker.clamav.net/list/clamav-users.html
Dennis Peterson
2007-03-05 17:27:58 UTC
Permalink
Post by Christopher X. Candreva
It makes a great deal of sense to move the files into the clam DB directory
to insure an atomic operation. If clamd/clamav-milter should happen to
reload with a half-copied file in the DB dir, it will likely stop running.
Yah - I realized that after reviewing the suggestion. Too much focus on just
one element of the entire problem.
You can also use rsync to copy the file(s) from the download location to the
clam directory on the same server. I believe rsync will make a temp file
then mv it into place, plus it's an easy way to only update the files that
rsync -av ./phish.ndb ./scam.ndb ./MSRBL-* /usr/local/share/clamav
In Unix systems rsync creates a hidden file in the destination directory
and then renames it at the end of the copy. I don't know how it works in
Windows or if it even does, but it is a very nice feature.

dp
_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://lurker.clamav.net/list/clamav-users.html
Loading...