OSU Navigation Bar

The Ohio State University

Department of Statistics

Cockins Hall
rollover image OSU Statistics
            Home

design element

OSU Statistics

Home

News

Research & Consulting Groups

People

For Visitors

For Prospective Students

For Current Students, Staff & Faculty

Contact Us



rollover image

For Current Students & Faculty

rollover image

Courses

rollover image

Links

rollover image

Computer Support

rollover image

Internal Documents

rollover image

webmail

Using SpamAssassin

Spam (or unsolicited commercial e-mail, UCE) is a major problem and can account for a significant amount of e-mail. Often hitting the unsubscribe link, doesn't work, or worse, just confirms that your address is a working e-mail address.

There are services such as Spam Cop which provide some assistance. For especially bad spam, you can contact the FTC and complain.

On our system we have a tool called SpamAssassin, which works in conjunction with procmail, a mail filtering program.

WARNING: using procmail incorrectly can irretrievably lose e-mail. For notes on using procmail please see its page on the support web site.

Setup:

All mail on our system is tagged with its spam status, all you have to do is enable procmail to filter based on that status.

Filtering can be set up with the following steps, done from the prompt on a unix machine:

  1. Copy the default .procmailrc (procmail's resource file) to your home directory:

            cp /usr/local/dot-files/procmailrc.example ~/.procmailrc
            
  2. Do not create a .forward to use procmail, it will cause your e-mail to bounce.

  3. Now set up the filter by uncommenting the lines:

            :0:
            * ^X-Spam-Status: Yes
            spam
            

    Now procmail will put anything that SpamAssassin has tagged as spam into a mailbox in ~/mail/spam, which you can read at a later time.

Now test this by sending yourself a quick message. It should appear in your inbox withing a few seconds. If it does NOT, there is a problem, rename your .procmailrc to something else, maybe pmrc, immediately! If you get the mail, then eventually you will probably get some spam, which will hopefully be dealt with accordingly.

Notes:

It is important to clean out your spam box often so that it does not become too large.

Although SpamAssassin typically is accurate, sometimes it will miss spam, or, worse, put legitimate mail in your spam box. Thus, it is important to check the spam box often for mail mistakenly marked as spam.

Since this works in your ~/mail directory and system inbox, it should be compatible with IMAP mail readers such as Outlook and Eudora.

If you plan to use the vacation program, please see the vacation documentation on this site.

Advanced: Whitelisting and Spam Threshold

In your home directory (/home/username), there is a .spamassassin directory, inside there is a file called user_prefs. Thus you can edit /home/username/.spamassassin/user_prefs.

In the default user_prefs file there are examples on how to whitelist certain senders, and also what score (hits) threshold before something is marked as spam.

Advanced: Teaching SpamAssassin

SpamAssassin has a Bayesian learning function. This is just a quick tutorial, for full information, you are encouraged to do a man sa-learn, as there is much more information there. Using this function is optional and time consuming, though it will increase SpamAssassin's hit rate.

First, for this to work at all well, the learner needs to be trained with a few hundred messages, both spam and ham. (Ham is non-spam.) Without the training, or just giving it spam, it will not work well, thus we need a source of spam/ham to load it with.

Ham: To train ham, typically the ~/mail directory where pine/Outlook/Evolution stores mail folders contains a lot of messages. (If you've been using SpamAssassin for a while, there might also be a spam folder here. Move it somewhere else or remove it, as we don't want to include it in ham learning.) Then to train, moving your existing spam folder, do:

    cd ~/mail
    mv spam ..
    sa-learn --mbox --ham *
    mv ../spam .
    

This may take quite some time, and it would be better do individual mailboxes, instead of the * wildcard to be easier on the system. When done it will report how many messages it examined and how many it leaned from

To train with spam is trickier, as training with spam SpamAssassin has caught is inefficient as it knows this is spam already. Thus, over time save messages that SpamAssassin has not caught to a separate mail folder, for example, spam_to_learn. Then periodically run this folder though sa-learn like:

    sa-learn --mbox --spam ~/mail/spam_to_learn
    cp /dev/null ~/mail/spam_to_learn
    

The cp /dev/null then empties the learn file so that you don't train the same spam twice, which would be inefficient. Often if you come back from an absence there will be quite a few non-caught spam built up, and this is a good time to save a bunch and train SpamAssassin.

Eventually, with enough training, SpamAssassin will get more accurate.



If you have trouble accessing this page, or need an alternate format contact webmaster@stat.osu.edu.