How can I explain to dspam that the user "brandon" is the same as "brandon@mydomain"

Posted by Brandon Craig Rhodes on Server Fault See other posts from Server Fault or by Brandon Craig Rhodes
Published on 2009-11-21T04:05:59Z Indexed on 2010/03/22 3:51 UTC
Read the original article Hit count: 565

Filed under:

spam-filter

I am using dspam for spam filtering by running the "dspamd" daemon under Ubuntu 9.10 and then setting up a Postfix rule that says:

smtpd_recipient_restrictions =
    ...
    check_client_access pcre:/etc/postfix/dspam_everything
    ...

where that PCRE map looks like this:

/./ FILTER lmtp:[127.0.0.1]:11124

This works well, and means that all users on my system get all of their email, whether "dspam" thinks it is innocent or not, and have the option of filtering on its decisions or ignoring them.

The problem comes when I want to train dspam using my email archives. After reading about the "dspam" command, I tried this on the files in my Inbox and spam boxes (which date from when I was using another filtering solution):

for file in Mail/Inbox/*; do cat $file | dspam --class=innocent --source=corpus; done
for file in Mail/spam/*; do cat $file | dspam --class=spam --source=corpus; done

The symptom I noticed after doing all of this was that dspam was horrible at classifying spam — it couldn't find any! The problem, when I tracked it down, was that I was training the user "brandon" with the above commands, but the incoming email was instead compared against the username "brandon@mydomain", so it was running against a completely empty training database!

So, what can I do to make the above commands actually train my fully-qualified email address rather than my bare username? I would like to avoid having to run "dspam" as root with a "--user" option. I would have expected that the "dspam" configuration files would have had an "append_domain" attribute or something with which to decorate local usernames with an appropriate email domain, but I can't find any such thing.

When I used to use the Berkeley DB backend to "dspam", I solved this problem by creating a symlink from one of the databases to the other. :-) But that solution eventually died because the BDB backend is not thread-safe, so now I have moved to the PostgreSQL back-end and need a way to solve the problem there. And, no, the table where it keeps usernames has a UNIQUE constraint that prevents me from listing both usernames as mapping to the same ID. :-)

Developer IT