Introduction
DSPAM is a free software statistical spam filter written by Jonathan A. Zdziarski, author of the book Ending Spam and other books. It is intended to be a scalable, content-based spam filter for large multi-user systems. DSPAM is written in C and distributed under the terms of the GNU General Public License.
DSPAM's original author claims that some users of DSPAM have reported as high as 99.5 to 99.95% accuracy, including "best recorded levels of accuracy ... 99.991% by one avid user (2 errors in 22,786) and 99.987% by the author.".
Installation
In order to install Dspam, first install the dspam package from the Universe repository using your favorite package manager. For example:
sudo aptitude install dspam
Simply accept the defaults when the installation process asks questions. The configuration will be done in greater detail in the next stage.
Configuration
There are two main configuration files for dspam: 1. /etc/default/dspam 2. /etc/dspam/dspam.conf
Dspam configuration
Edit the first file to allow dspam starting:
# Variables for dspam. # # Do not start dspam. START=yes # User that runs dspam. USER=dspam # Options for dspam. #OPTIONS="--debug"
Now, in order to start dspam as a daemon, we'll edit the second file: (here's a full dump of the file, without commented lines| cat /etc/dspam/dspam.conf | egrep -v "^\s*(#|$)")
Home /var/spool/dspam StorageDriver /usr/lib/dspam/libhash_drv.so TrustedDeliveryAgent "/usr/sbin/sendmail" DeliveryHost 127.0.0.1 DeliveryPort 10024 DeliveryIdent localhost DeliveryProto SMTP OnFail error Trust root Trust dspam Trust mail Trust mailnull Trust smmsp Trust daemon Trust postfix Trust www-data TrainingMode teft TestConditionalTraining on Feature chained Feature whitelist Algorithm graham burton PValue graham Preference "spamAction=tag" Preference "signatureLocation=headers" # 'message' or 'headers' Preference "showFactors=off" AllowOverride trainingMode AllowOverride spamAction spamSubject AllowOverride statisticalSedation AllowOverride enableBNR AllowOverride enableWhitelist AllowOverride signatureLocation AllowOverride showFactors AllowOverride optIn optOut AllowOverride whitelistThreshold HashRecMax 98317 HashAutoExtend on HashMaxExtents 0 HashExtentSize 49157 HashMaxSeek 100 HashConnectionCache 10 Notifications off PurgeSignatures 14 # Stale signatures PurgeNeutral 90 # Tokens with neutralish probabilities PurgeUnused 90 # Unused tokens PurgeHapaxes 30 # Tokens with less than 5 hits (hapaxes) PurgeHits1S 15 # Tokens with only 1 spam hit PurgeHits1I 15 # Tokens with only 1 innocent hit LocalMX 127.0.0.1 SystemLog on UserLog on Opt out TrackSources spam ham ParseToHeaders on ChangeModeOnParse on ChangeUserOnParse on ServerPort 11124 ServerQueueSize 32 ServerPID /var/run/dspam/dspam.pid ServerMode auto ServerParameters "--deliver=innocent -d %u" ServerIdent "localhost.localdomain" ClientHost 127.0.0.1 ClientPort 11124 ProcessorBias on Include /etc/dspam/dspam.d/
As you can see, we're going to bind the daemon to a localhost port to get more performance.
Postfix Integration
Now let's make Postfix speak to Dspam.
Edit the /etc/postfix/master.cf and add the following:
# DSPAM 127.0.0.1:10024 inet n - n - - smtpd -o smtpd_authorized_xforward_hosts=127.0.0.0/8 dspam-retrain unix - n n - 10 pipe flags=Ru user=dspam argv=/etc/dspam/dspam-retrain $nexthop $sender $recipient
right after:
smtp inet n - - - - smtpd
The dspam-retrain will be used to pass spam messages to dspam for training.
Now edit /etc/postfix/main.cf, and replace in:
smtpd_client_restrictions =
the last line (which usually ends in word permit), with the following:
check_client_access pcre:/etc/postfix/dspam_filter_access permit_auth_destination
You should get something like:
smtpd_client_restrictions = permit_mynetworks [...] check_client_access pcre:/etc/postfix/dspam_filter_access permit_auth_destination
Also add somewhere (usually at the end of the file) the following:
# DSPAM dspam_destination_recipient_limit = 1
/etc/postfix/dspam_filter_access file should contain the rules to connect to dspam lmtp daemon:
/./ FILTER lmtp:[127.0.0.1]:11124
Now modify the /etc/postfix/transport to create the rules for training addresses:
spam@your.domain.tld dspam-retrain:spam ham@your.domain.tld dspam-retrain:innocent
In the end, let's create the file that will parse messages for training dspam. Create a perl script /etc/dspam/dspam-retrain with the following:
# Get arguments $class = $ARGV[0] || die; shift; $sender = $ARGV[0] || die; shift; $recip = $ARGV[0] || die; shift; if ($recip =~ /^(spam|ham)-(\w+)@/) { # username is part of the recipient $user = $2; } elsif ($sender =~ /^(\w+)@/) { # username is in the sender $user = $1; } else { print "Can't determine user\n"; exit 75; # EX_TEMPFAIL } # Pull out DSPAM signatures and send them to the dspam program while (<>) { if ((! $subj) && (/^Subject: /)) { $subj = $_; } elsif (/(!DSPAM:[a-f0-9]+!)/) { open(F, "|/usr/bin/dspam --source=error --class=$class --user $user"); print F "$subj\n$1\n"; close(F); } elsif (/(X-DSPAM-Signature: [a-f0-9]+)/) { open(F, "|/usr/bin/dspam --source=error --class=$class --user $user"); print F "$subj\n$1\n"; close(F); } }
Don't forget to add "#! /usr/bin/perl" and execution permission to the file:
chmod +x /etc/dspam/dspam-retrain
Service management
To start, stop or restart the service, use:
invoke-rc.d dspam start invoke-rc.d dspam stop invoke-rc.d dspam restart
Also after starting Dspam daemon, don't forget to restart Postfix:
invoke-rc.d postfix restart
Debugging
Use dspam_stats -H for stats monitoring.
Also, your emails should now contain dspam header signatures. Something like:
X-DSPAM-Result: Innocent X-DSPAM-Processed: Tue Jun 16 20:25:03 2009 X-DSPAM-Confidence: 1.0000 X-DSPAM-Probability: 0.0023 X-DSPAM-Signature: 4a37d56f12661069814937
That's all.