Home | About Spam | About SpamBouncer | Downloads | Configuration | Reference | Resources
What is Spam? | Types of Spam | How Spammers Work | Avoiding Spam | Filtering Spam | Spam Complaints | Fighting Spam | Glossary

Under Construction
Under Construction 
There are a whole bunch of spam-filtering technologies available. Some are widely effective and (as a result) widely used. Some are effective for certain niche uses or in conjunction with other technologies to catch what they miss or help mitigate problems with false positives. Yet others are a bad idea that add to the spam problem and should not be used by anybody. On this page, I review the spam filtering methods that users are most likely to encounter.
Bayesian filters are filters that use the statistical analysis methods of Thomas Bayes, an eighteenth-century British mathematician, to analyze email and determine what is spam. Bayesian filters, unlike static spam filters, learn from experience. As they filter your email, you tell them when they make a mistake -- either by classifying spam as legitimate email, or by classifying legitimate email as spam. Most users find that, after a few days of learning, good Bayesian filters rarely make a mistake.
The current crop of Bayesian filters was inspired by, if not based on, Paul Graham's 2002 paper, "A Plan for Spam."
From what I've seen, bayesian filtering is effective IF AND ONLY IF each user takes the time to "train" the filter on his or her own email. "Pre-trained" Bayesian filters are much less effective, and therefore Bayesian filters do not work well when installed on a mail server and used on email for a large number of people. Bayesian filtering is nonetheless a useful tool to fight spam for individual users. For that reason, I list and recommend several spam filters that use Bayesian filtering, either exclusively or as part of a set of spam-filtering techniques.
Blocklists are lists of IPs, IP ranges, or domain names that are used to send spam. Most blocklists are updated at a central location, and mail servers and spam filters check them while filtering email. Blocklists therefore can respond quickly when spam starts coming from a new source, or when new spam havens appear.
There are two types of blocklist technology at present. DNS-based blocklists (DNSBLs), the most common type, list IPs and IP ranges from which spam is sent or which host web sites that are advertised in spam. Mail servers and spam filters then check the IPs of sending mail servers, or of advertised web sites, against these blocklists. If the IP is listed, they reject the email, filter it into a spam folder, or "score" it negatively. Domain-based blocklists (RHBSLs) list domains instead of IPs, but are otherwise used in much the same way.
There is a much larger variety of types of blocklist if you consider, not the technology used, but the type of listing. The following are a few of the more common type of blocklist in operation at present:
Blocking email sent from blocklisted servers can be a highly effective way to stop spam from reaching your mailbox.
DNSBLs are essential in the fight against spam because they allow information about spam sources to be collected and kept up-to-date in one location, but used in many locations around the world. Naturally, there are good DNSBLs with responsible maintainers and good policies, and bad DNSBLs with irresponsible maintainers and bad or non-existent policies, which means you must choose which ones to use carefully. Fortunately the best of them (such as the SpamHaus SBL and XBL, the CBL, and the SURBL) are very good indeed, and getting better all the time because of wide public support, public feedback, and public accountability.
All of the server-based spam-stopping technologies, and many of spam filters, make heavy use of DNSBLs. In my opinion, Internet Users everywhere owe Paul Vixie a beer (or drink of his choice) to thank him for thinking up the idea and developing the first DNSBL -- the Mail Abuse Prevention System (MAPS). (Paul, if you read this, let me know when and where and I'll buy you one.)
Chellenge/Response (C/R) systems hold email from unknown users in a temporary location, and send a challenge to the sender's email address -- usually to the address in the From: header -- asking for the sender to confirm that he or she actually sent the email and promises that it isn't spam. When the sender replies, the email goes through and (usually) the program whitelists that sender so that they don't receive further challenges.
I think C/R systems were marginally acceptable some years ago, when the spam problem was nowhere near as great. Today, however, they are a BAD IDEA that no responsible spam-stopping system should use. This is because the vast majority of spam comes with a forged sender. In today's world, where something between 50% and 80% of all email is spam, a C/R system can add significantly to the useless email load on an email system that is already straining to keep up. In addition, C/R systems bombard innocent users with annoying unsolicited bulk email (yeah, spam) because of spam that some spammer sent with their email address forged in the From: line.
I know of one "anti-spam" C/R service -- SpamArrest. I strongly recommend that you steer clear of it, and any other software that uses C/R technology.