Email filtering involves the processing of email messages according to predetermined criteria. Most often email filtering refers to the automatic processing of incoming messages, but can also involve human intervention as well as the intervention of artificial intelligence. Email filtering software accesses email messages as inputs and as an output can either cause an email message to pass through the filtering process unchanged for delivery to a user's email message mailbox, redirect the email message for delivery elsewhere, or even throw the email message away.
Spammers send unsolicited bulk email or unsolicited commercial email that is referred to as “spam”. Spam can refer to the unsolicited bulk or commercial email itself or to its content. Spammers attempt to devise email messages that contain spam that can penetrate email filters and be delivered to targeted email users. Spammers use various techniques in order to fashion spam laden email messages that can penetrate an email filter. One approach taken by spammers involves running test messages through spam filters in order to determine the words and other email attributes that the spam filters consider to be legitimate. By adding sufficient numbers of words and attributes that are considered to be legitimate to an email message that contains spam, an email filter can be led to classify the email message as legitimate and to allow it to pass through to the email message mailbox of targeted users.
It should be appreciated that legitimate messages typically have many words that are slightly good, some that are slightly spammy, and only a small number of words that are extremely good or extremely spammy. Spammers attempting to work around an email filter attempt to deliver very spammy content to targeted users in email messages where such content is offset by a substantial amount of highly legitimate content that is included in the email messages. The spammy content and the highly legitimate content when aggregated results in the email filter giving the email message a good score.
It is interesting to note that some of the spammers that attempt to work around spam filters add such a large number of determined legitimate words that their messages get better scores than the best legitimate messages. Moreover, conventional filters are incapable of detecting such illegitimate messages and actually regard them as the best messages. Because of this, spammers can work around content based spam filters by finding gaps such as these in what the spam filter is able to detect and exploiting them (e.g., by adding a bunch of gibberish sentences full of legitimate words to an email message to make spam filters think the email message is legitimate). Accordingly, conventional spam filters are ineffective at identifying spam laden email messages that are devised by sophisticated spammers to frustrate conventional spam filters.