Home | About Spam | About SpamBouncer | Downloads | Configuration | Reference | Resources
Overview | Quick Start | Preparation | Installation | Configuration | Troubleshooting | Bug Reports | Upgrading

Under Construction
Under Construction 
When this is complete, it will contain an overview of the procmail program, a description of the SpamBouncer program files and directories, and an overview of what you should do to prepare before you install the SpamBouncer.
Procmail is a mail filtering program with a powerful, but arcane scripting language. Procmail runs the SpamBouncer; the SpamBouncer is nothing more than a large set of interlinked Procmail scripts. You do not need to know procmail very well to install the SpamBouncer, but you should understand the basic concepts behind it and be able to write a simple procmail script, or recipe.
A procmail recipe consists of three sections: the procmail header, the conditions, and the disposition.
:0:
A procmail recipe can contain any number of conditions. The following condition tells procmail to look for email from mjones@example.com:
* ^From:.*mjones@example\.com
newmail
Below is the complete procmail recipe described above, with the header on the first line, the condition following the header, and the disposition at the bottom. This recipe filters incoming email for messages from mjones@example.com, and files any matching messages in a folder called newmail.
:0:
* ^From:.*mjones@example\.com
newmail
There is a nearly infinite number of variations on this theme. The procmail recipe below filters incoming email for messages from mjones@example.com, and forwards those messages directly to bsmith-pager@example.com.
:0
* ^From:.*mjones@example\.com
! bsmith-pager@example.com
In this recipe, the second colon is missing from the procmail header because procmail is not attempting to deliver the email to a folder on the local system and therefore does not need to lock the mail folder.
The following recipe is nearly identical to the last one, but instead of forwarding the original message to bsmith-pager@example.com, it creates a copy and forwards the copy.
:0 c
* ^From:.*mjones@example\.com
! bsmith-pager@example.com
The original message remains in the mail delivery stream and procmail continues to filter it.
The following recipe filters incoming email for posts to the bulk email list SKYDIVER-L, which puts the string SKYDIVER-L at the beginning of the Subject: header, that were not sent by the user bsmith@example.com. It saves a copy of each post to a folder called skydiver-l, and forwards the original post to bsmith-home@example.com.
:0
* ! ^From:.*bsmith@example\.com
* ^Subject: SKYDIVER-L
{
:0 c:
skydiver-l
:0
! bsmith-home@example.com
}
In the previous recipe, the delivery line consists of a nested group of recipes enclosed in curly brackets ({}). The first nested recipe locks the delivery folder (as insructed by the colon flag), and then saves a copy of the post to that folder (as instructed by the "c" flag). More complex procmail scripts, such as the SpamBouncer, may have recipes nested inside of recipes inside of recipes, many layers deep. Procmail is extremely flexible -- you can nest recipes as many layers deep as your CPU and memory will allow.
The following recipe filters the email body only of incoming messages for email that contains both of the spammer domains spamsite.com and phishsite.com, and delivers that email to the folder named junkmail.
:0 B:
* spamsite\.com
* phishsite\.com
junkmail
The following recipe is nearly identical, but delivers the email to /dev/null, the Unix system trashcan, instead of to a folder.
:0 B:
* spamsite\.com
* phishsite\.com
/dev/null
Procmail recipes range from these simple examples to complex. If you want more examples, you can look through the links on the Procmail Home Page, read the procmail examples man page by typing man procmailex at the Unix shell command prompt, or read the SpamBouncer code. The SpamBouncer code is reasonably well documented, and offers hundreds of examples of procmail recipes, from simple to extremly complex.
Procmail recipes rely heavily on regular expressions, so I will describe them in somewhat greater detail here. A regular expression is a pattern that describes one or more text strings. Regular expressions consist of literal text strings, which represent themselves, and metacharacters, which represent something other than themselves. In procmail, regular expressions appear only on in a recipe's conditions, not in the header or the disposition.
Procmail regular expression syntax is a bit unusual; I call procmail's version of regular expressions irregular expressions. <wry grin> Those of you who are familiar with regular expressions in the unix grep or awk utilities, or in the perl program, may find procmail's version a bit confusing at first. For example, unlike most regular expressions, procmail regular expressions are case insensitive by default -- if you search for the lower-case letter a, procmail will also match the capital letter A unless you explicitly tell it to do a case-sensitive search. Unlike most regular expressions, procmail metacharacters are not preceded with a backslash; instead, the backslash precedes these characters only when you want to tell procmail to treat them literally. Procmail also does not support bounds ({n,n} constructions) or POSIX character classes.
In the condition of the recipe above, the caret (^) metacharacter tells procmail that the string From: must appear at the beginning of the line of text where it appears. It cannot have any other characters to the left of it, not even a tab or space. Since email headers always appear at the beginning of a line of text, this prevents procmail from mistakenly matching the word "From" in the middle of a line.
The period and asterisk combination (.*) that follows tells procmail that a string of any number of letters, numbers, and symbols can appear between the "From:" and the email address. The period tells procmail that any character except for a newline can appear in that position. A period can represent any letter, number, or symbol. The asterisk tells procmail that the preceding character can appear from zero to an infinite number of times. Since different email programs format the From: line of emails differently, this lets procmail look for the email address anywhere on the line following the From: header.
Finally, the string mjones@example\.com tells procmail to look for that specific email address. The backslash (\) metacharacter in front of the period tells procmail to treat the period as a literal period instead of as a metacharacter. You can use a backslash to force Procmail to treat any metacharacter as a literal character.
The most common metacharacters used in procmail regular expressions are:
| Metacharacter | Description |
|---|---|
| ^ | Requires the following character or string to appear at the beginning of a line of text, with no characters preceding it. For example, if you want to search an email's From: header, you can type ^From:. |
| $ | Requires the preceding character or string to appear at the end of a line of text, with no characters following it. For example, if you want to search an email's Message-ID: header for all message-IDs ending in the string @example.com>, you can type ^Message-ID:.*@example\.com>$. |
| . | Matches any single character except the newline character. Can match a letter, a number, or any symbol. For example, if you want to search an email's Subject: header for the string Viagra, but also want to catch typical spammer mispellings such as Vi@gr@ and V18gra, you can type ^Subject:.*V..gr.. |
| ? | Matches the preceding character zero or one times. For example, if you want to search an email's Subject: for the word "pill," and want to match "pills" as well, you can type pills?. |
| * | Matches the preceding character from zero to an infinite number of times. For example, if you want to search an email's From: header for the email address mjones@example.com, and want to ensure that procmail finds that email address anywhere on the line after the From: email header, you can type ^From:.*mjones@example\.com. |
| + | Matches the preceding character from one to an infinite number of times. For example, if you want to search an email's Subject: header for the string Viagra, but also want to catch typical spammer obfuscations such as Viiagra and Viiiiagra, you can type ^Subject:.*Vi+agra. |
| \ | Forces procmail to treat the following character as a literal string instead of a metacharacter. From: header for the email address mjones@example.com, and want to ensure that procmail matches only mjones@example.com and not mjones@example-com.net, you can type a backslash before the period to force procmail to treat the period as a literal period and not match any character: ^From:.*mjones@example\.com. |
In additional to single characters and character strings, procmail supports two additional units that these metacharacters can operate on: character groups and character classes. A character class is a set of characters enclosed in square brackets. Any character inside the brackets can match the character at that position. For example, the following regular expression matches the strings viagra, v1agra and vlagra on the Subject: header of an email, frustrating the poor spammer who tried misspelling the word to get past your filters:
* ^Subject:.*v[i1l]agra
Inside a character class, most metacharacters loose their special meaning and become literal characters. Only two characters have special meaning; the hyphen (-) and the caret (^). The hyphen, when placed between two letters or two numbers, designates the range of letters or numbers that extends from the first to the last. For example, the following regular expression matches any email address from a series of spamming servers named spamserver1.com through spamserver9.com:
* ^From:.*spamserver[1-9]\.com
You can precede or follow a character class with one of the metacharacters, just as you could a single character. For example, the following regular expression uses the plus metacharacter to catch email from a series of spamming servers with domain names beginning with spamserver, followed by a number of any length, followed with .com:
* ^From:.*spamserver[0-9]+\.com
If you want to include a hyphen inside a character class as a literal character, you must put it first. For example, the following regular expression allows one or more hyphens in the number following the spamserver string:
* ^From:.*spamserver[-0-9]+\.com
The caret, when placed at the beginning of a character class, inverts the character class, telling procmail to match any character except for the characters in the character class. For example, the following regular expression matches any Subject: line that does not contain any letters or numbers.
* ^Subject:[^0-9a-z]*$
If you want to include a caret inside a character class as a literal character, you must put it anywhere but first, and it will be treated literally.
A character group is a string of characters enclosed in parantheses. Enclosing a string in parantheses tells procmail to treat the entire string as a single character when it parses the regular expression. For example, the following regular expression matches both the word invest and the word investing on the Subject: header of an email:
* ^Subject:.*invest(ing)?
A character group is frequently used to match an alternate set of choices, by enclosing the list of choices inside parantheses and separating each item using the vertical bar (|) metacharacter. For example, the following regular expression matches the words investment and investing on the Subject: header of an email:
* ^Subject:.*invest(ment|ing)
If you want to match the words invest, investment, and investing using a single regular expression, you can simply add a question mark after the character group:
* ^Subject:.*invest(ment|ing)?
If you look through the SpamBouncer code, you will see a few somewhat complicated looking regular expressions over and over. The table below lists some common regular expressions used in the SpamBouncer, and what they do.
| Regular Expression | Description |
|---|---|
| (^|[^0-9a-z]) | Matches either the beginning of the line, or any character that is not a letter or number. This regular expression and variants of it are used to designate the beginning of a word, acronym, or string. For example, the expression aol\.com matches both the domain aol.com and the fraudulent domain phishaol.com. The expression (^|[^-_0-9a-z])aol\.com matches only the legitimate domain. |
| ([^a-z0-9.]|$) | Matches either the end of the line, or any character that is not a letter, a number or the period. This regular expression and variants of it are used to designate the end of a word, acronym, or string. For example, the expression yahoo\.com matches both the generic domain yahoo.com and the Australian regional domain yahoo.com.au. The expression (^|[^-_0-9a-z])yahoo\.com([^a-z0-9.]|$) matches only the generic domain. |
| (ÿ|\.|[=%]2E) | Matches a period in MIME-encoded text with ASCII character entities, or in text in several character sets, instead of just in ASCII text. This regular expression and variants of it are used to designate a literal period in regular expressions used on the message body of email, so that procmail won't miss a domain, host or IP simply because the periods are represented as ASCiI entities or in an alternate character set. |
| .*$?.* | Matches two words that might be separated by any number of other words, punctuation, or even a newline. This regular expression appears in pattern matching filters where a word wrap might interrupt a pattern that I want to make sure is matched if it is present. |
| [^a-z]*$?[^a-z]* | Matches two words that might be separated by any number of spaces, tabs, or a newline, but not by other words. This regular expression appears in pattern matching filters where two words appear together in a sentence, but might be separated by punctuation and/or a newline. |
BLAH BLAH BLAH... This section has barely been started. <G>
A variable can be set to almost any string value. Procmail is capable of some rudimentary calculation with numbers, but for the most part treats all Variable values as strings. Variable values can be enclosed in either single or double quotes. Most variables in the SpamBouncer do not include embedded spaces or special characters, and therefore do not need to be enclosed in quotes.
Once declared, in Procmail a variable is referenced using a $ before the variable name. I usually enclose variable references in curly brackets so that it is absolutely clear in all cases where the variable name ends, regardless of where the variable may be embedded. It is not strictly necessary to do this, however; both ${DEFAULT} and $DEFAULT mean the same thing and are handled in the same way in the vast majority of circumstances.
When setting the value of a variable, you can reference another variable that has already been declared, just as you would reference it inside a Procmail recipe. (See the MAILDIR variable setting above for an example of this; MAILDIR references the HOME variable, which Unix sets to the user's home directory.)
Procmail is a complex program; documenting it fully would require writing a book, if not several books. Unfortunately, nobody has written those books. <sigh> If you aspire to procmail gurudom, however, there are resources. The Procmail Home Page has links to many of these resources, including the venerable Procmail Mailing List and a series of web pages containing introductions and how-to tips by long-time procmail masters such as current procmail project developer Philip Guenther, Jari Aalto, Era Eriksson, and others.
The SpamBouncer is a set of procmail scripts. Procmail scripts are ASCII text files. Your system's copy of the Procmail program reads the SpamBouncer scripts to determine how it should process your incoming email.
NOTE: Before you install the SpamBouncer, ensure that you have Procmail, version 3.11 or above, installed on your computer and working properly with the user account or user accounts that will use the SpamBouncer. See the Procmail web site for more information. The SpamBouncer developers cannot help you install or configure Procmail for your server.
The entry, or index, script for the SpamBouncer, sb.rc, calls all other scripts and accesses all other files that comprise the SpamBouncer. Most users download the SpamBouncer as a compressed archive file, and unarchive it in the directory where they want to store the program scripts. After you unarchive the file, the directory tree looks like this:
SBDIR
/auxiliary
/data
/docs
/functions
The directories contain the following:
To install the SpamBouncer, you download the program archive from the web site or ftp site, put the archive in the directory where you want to store the files, and unarchive it. The unarchive program will create the proper subdirectories and store the SpamBouncer scripts in their proper locations.
NOTE: To make updating easier, you should not modify the SpamBouncer program files. If you want to use a modified version of a SpamBouncer recipe, copy the recipe into a separate recipe file outside of the SpamBouncer program directory tree, modify it as you wish, and then call your modified file by using an INClUDERC statement at the appropriate place in your .procmailrc file.
Write this, dammit!