As used throughout this specification including claims, “spam” is any electronic message that is unwanted by the recipient; and a “clean” electronic message is one that is not spam. The amount of spam sent over computer networks has increased with the increasing popularity of electronic messaging schemes such as e-mail. Spam filters have been designed to counter the flood of spam. However, spammers have employed various tricks to neutralize the spam filters and thereby place their unwanted messages in front of recipients.
Once such trick employed by spammers (illustrated in FIG. 1) is to break up the electronic message 1 into two portions: a visible portion 2 that is visible to the human recipient and readable by the spam filter, and an invisible portion 3 that is invisible to the human recipient but nonetheless readable by the spam filter. The visible portion 2 contains the spam message, typically between 10 and 20 words long, while the invisible portion 3 is much longer, typically between 1000 and 2000 words long. The invisible portion 3 contains characters that lull the spam filter into concluding that the message 1 is clean. In the case where the spam filter is a statistical filter (such as a Bayesian filter, a neural network, or a support vector machine), the invisible portion 3 of the message contains many more words than the visible portion 2. Furthermore, the invisible text 3 contains words that are innocuous. Since the spam filter processes many more innocuous words from the invisible portion 3 than spam words from the visible portion 2, the spam filter erroneously concludes that, as a whole, the message 1 is clean.
This spamming technique can be used with any spam filter that takes into account characters within the message 1. In the example shown in FIG. 1, if the spam filter has been programmed to conclude that a message 1 is clean when the word “cancer” appears in the message 1, the spammer can place the word “cancer” in the invisible portion 3 of the message, counteracting the effect of the word “breast” in the visible portion 2 of the message. (The word “breast” would normally trigger the spam filter to conclude that the message 1 contains spam.)
The present invention provides methods, apparati, and computer readable media to counter the above-described spamming technique.