Difference between revisions of "SpamAssassin"

From Nuclear Physics Group Documentation Pages
Jump to navigationJump to search
 
(5 intermediate revisions by the same user not shown)
Line 7: Line 7:
  
 
A reference in /etc/postfix/master.cf lets the mail system know to use spamassassin, i.e. "spamd"
 
A reference in /etc/postfix/master.cf lets the mail system know to use spamassassin, i.e. "spamd"
 +
 +
You can check that spamassassin does not have errors in the configuration with:
 +
<pre>
 +
spamassassin --lint
 +
</pre>
 +
 +
To make sure it is tagging spam properly you can send it a test:
 +
<pre>
 +
spamassassin -D < /usr/share/doc/spamassassin-3.3.1/sample-spam.txt
 +
</pre>
 +
There is also a no-spam file there. Note that the -D gives a TON of output for debugging, and is not needed for testing basic functionality.
 +
 +
Detailed info on spamassassin setup is found at: http://spamassassin.apache.org/full/3.1.x/doc/Mail_SpamAssassin_Conf.html
  
 
=== Sieve ===
 
=== Sieve ===
Line 26: Line 39:
 
=== Plugins ===
 
=== Plugins ===
  
 +
SpamAssassin plugins are found in: /usr/lib/perl5/vendor_perl/5.8.8/Mail/SpamAssassin
  
 +
==== AutoWhiteList ====
 +
 +
See: http://wiki.apache.org/spamassassin/ManualWhitelist
 +
 +
The old whitelist plugin was called AWL, the new one, which we use as of September 2014 is TxRep.
 +
This automatically adds spam messages to a blacklist and not-spam (i.e. ham) to a whitelist.
 +
 +
It is turned on in init.pre:
 +
<pre>
 +
# Learning Module: see http://truxoft.com/resources/txrep.htm
 +
loadplugin Mail::SpamAssassin::Plugin::TxRep
 +
</pre>
 +
 +
With some options in local.cf
 +
<pre>
 +
#
 +
# For the TxRep module
 +
#
 +
header        TXREP  eval:check_senders_reputation()
 +
describe      TXREP  Score normalizing based on sender's reputation
 +
tflags        TXREP  userconf noautolearn
 +
priority      TXREP  1000
 +
</pre>
 +
 +
Currently it is setup to use the '''USERS''' directory to store the spam/not-spam lists, which is also true for the BAYES analysis. The use MUST have a directory ".spamassassin". To initialize the files in that directory run:
 +
<pre>
 +
sa-learn --sync  # For Bayes.
 +
</pre>
 +
 +
To check if there is a list:
 +
<pre>
 +
sa-learn -dump
 +
</pre>
 +
 +
To teach the list from an IMAP mailbox that contains spam messages and is called SpamLearn:
 +
<pre>
 +
fetchmail -a -n --folder SpamLearn  einstein.unh.edu -m "sa-learn --spam --single"
 +
</pre>
 +
Note to add --keep to the fetchmail line if you don't want these messages to be automatically deleted.
 +
 +
Similarly, you can tell it what is good emails with:
 +
<pre>
 +
fetchmail -a -n --folder GoodMail --keep  einstein.unh.edu -m "sa-learn --ham --single"
 +
</pre>
 +
 +
== SPAM Blacklist & Personal Configurations ==
 +
 +
There are many blacklists for known spammers. We should use them! SpamAssassin does check the blacklists, the trouble is that it is too timid in labeling the resulting hits as spam, hence you get lots of spam in your inbox.
 +
 +
You can check if an ip is listed in the block list by following the recipe at: [http://daemonforums.org/showthread.php?t=302 Check using dig]:
 +
<pre>
 +
take IP address a.b.c.d and reverse and add zen.spamhaus.org, then do a dig on that, i.e.:
 +
dig d.c.b.a.zen.spamhaus.org
 +
If it returns a 127.0.0.x it is a confirmed spammer
 +
</pre>
 +
 +
You can check the details (really detailed!) of how spamassasin scores a message by saving the full message and then piping it into spamassasin:
 +
 +
<pre>
 +
spamassasin -D < message.elm
 +
</pre>
 +
 +
Part of the trouble with the DNS blacklists is that the default spamassassin setting do not rate them as bad enough. The following settings help delete more spam. They can be further fine-tuned by adding additional score rules, including URIBL_BLACK. , Note that it is possible to get false positives (i.e. good mail marked as spam) this way. See below for testing first whether you would have gotten false positives. To increase the spam score for any of the SpamAssassin tags, edit the file ~/.spamassassin/user_prefs and add the following lines:
 +
<pre>
 +
# spamhaus DBL
 +
#
 +
score URIBL_DBL_SPAM  7.0
 +
score URIBL_SBL     3.0
 +
# Abusebutler
 +
score URIBL_AB_SURBL 5.0
 +
#
 +
</pre>
 +
 +
You can whitelist particular domains as well. This speeds up processing:
 +
<pre>
 +
whitelist_from  *.jlab.org
 +
whitelist_from  *.unh.edu
 +
whitelist_from  *.google.com
 +
whitelist_from  *.gmain.com
 +
whitelist_from  *.yahoo.com
 +
</pre>
 +
 +
=== Testing for specific tags ===
 +
 +
If you want to fine-tune your spam scores, you may want to test whether a previous message in a particular "good" mail folder got tagged by this spam tag. You can use fetchmail to check. The following example checks for URIBL_BLACK in a folder MyMail:
 +
<pre>
 +
fetchmail -a -n --folder MyMail -s --keep  einstein  -m "grep  URIBL_BLACK"
 +
</pre>
 +
If it finds any, you may not want to set your URIBL_BLACK score too high, since mail from that source would then be labelled spam.
  
 
== '''Important Note''' ==
 
== '''Important Note''' ==
 
SpamAssassin needs to have a user account named spamd, and this has to be a local account as well as being in the LDAP database.
 
SpamAssassin needs to have a user account named spamd, and this has to be a local account as well as being in the LDAP database.

Latest revision as of 16:58, 1 March 2015

Setup

We are using a fairly standard SpamAssassin setup, close to the default. Any variations from default MUST be noted here. Spam is getting out of hand, so the most basic setup is no longer sufficient.

Basic

A reference in /etc/postfix/master.cf lets the mail system know to use spamassassin, i.e. "spamd"

You can check that spamassassin does not have errors in the configuration with:

spamassassin --lint

To make sure it is tagging spam properly you can send it a test:

spamassassin -D < /usr/share/doc/spamassassin-3.3.1/sample-spam.txt

There is also a no-spam file there. Note that the -D gives a TON of output for debugging, and is not needed for testing basic functionality.

Detailed info on spamassassin setup is found at: http://spamassassin.apache.org/full/3.1.x/doc/Mail_SpamAssassin_Conf.html

Sieve

For spam filtering to work, each user needs a sieve script that directs the spam somewhere else. The most basic .sieve script is:

#  a simple SPAM filter
#
require "fileinto";

if header :contains "X-Spam-Flag" "YES" {
#
#  move messages with "X-Spam-Flag: YES" header
#  into "spam" folder
#
	fileinto "INBOX.SPAM";
}

Plugins

SpamAssassin plugins are found in: /usr/lib/perl5/vendor_perl/5.8.8/Mail/SpamAssassin

AutoWhiteList

See: http://wiki.apache.org/spamassassin/ManualWhitelist

The old whitelist plugin was called AWL, the new one, which we use as of September 2014 is TxRep. This automatically adds spam messages to a blacklist and not-spam (i.e. ham) to a whitelist.

It is turned on in init.pre:

# Learning Module: see http://truxoft.com/resources/txrep.htm
loadplugin Mail::SpamAssassin::Plugin::TxRep

With some options in local.cf

#
# For the TxRep module
#
header         TXREP   eval:check_senders_reputation()
describe       TXREP   Score normalizing based on sender's reputation
tflags         TXREP   userconf noautolearn
priority       TXREP   1000

Currently it is setup to use the USERS directory to store the spam/not-spam lists, which is also true for the BAYES analysis. The use MUST have a directory ".spamassassin". To initialize the files in that directory run:

sa-learn --sync   # For Bayes.

To check if there is a list:

sa-learn -dump

To teach the list from an IMAP mailbox that contains spam messages and is called SpamLearn:

fetchmail -a -n --folder SpamLearn  einstein.unh.edu -m "sa-learn --spam --single"

Note to add --keep to the fetchmail line if you don't want these messages to be automatically deleted.

Similarly, you can tell it what is good emails with:

fetchmail -a -n --folder GoodMail --keep  einstein.unh.edu -m "sa-learn --ham --single"

SPAM Blacklist & Personal Configurations

There are many blacklists for known spammers. We should use them! SpamAssassin does check the blacklists, the trouble is that it is too timid in labeling the resulting hits as spam, hence you get lots of spam in your inbox.

You can check if an ip is listed in the block list by following the recipe at: Check using dig:

take IP address a.b.c.d and reverse and add zen.spamhaus.org, then do a dig on that, i.e.:
dig d.c.b.a.zen.spamhaus.org
If it returns a 127.0.0.x it is a confirmed spammer

You can check the details (really detailed!) of how spamassasin scores a message by saving the full message and then piping it into spamassasin:

spamassasin -D < message.elm

Part of the trouble with the DNS blacklists is that the default spamassassin setting do not rate them as bad enough. The following settings help delete more spam. They can be further fine-tuned by adding additional score rules, including URIBL_BLACK. , Note that it is possible to get false positives (i.e. good mail marked as spam) this way. See below for testing first whether you would have gotten false positives. To increase the spam score for any of the SpamAssassin tags, edit the file ~/.spamassassin/user_prefs and add the following lines:

# spamhaus DBL
#
score URIBL_DBL_SPAM  7.0
score URIBL_SBL	     3.0
# Abusebutler
score URIBL_AB_SURBL 5.0
#

You can whitelist particular domains as well. This speeds up processing:

whitelist_from  *.jlab.org
whitelist_from  *.unh.edu
whitelist_from  *.google.com
whitelist_from  *.gmain.com
whitelist_from  *.yahoo.com

Testing for specific tags

If you want to fine-tune your spam scores, you may want to test whether a previous message in a particular "good" mail folder got tagged by this spam tag. You can use fetchmail to check. The following example checks for URIBL_BLACK in a folder MyMail:

fetchmail -a -n --folder MyMail -s --keep  einstein  -m "grep  URIBL_BLACK" 

If it finds any, you may not want to set your URIBL_BLACK score too high, since mail from that source would then be labelled spam.

Important Note

SpamAssassin needs to have a user account named spamd, and this has to be a local account as well as being in the LDAP database.