With the advent of the Internet, email has become prevalent in digital communications. For example, email messages are exchanged on a daily basis to conduct business, to maintain personal contacts, to send and receive files, etc. Unfortunately, undesired email messages have also become prevalent with increased email traffic. Often, these email messages are unsolicited advertisements, which are often referred to as “junk mail” or “spam,” sent by email mass-mailing programs or other entities, who are often referred to as “spammers.”
Bayesian filters have emerged as a robust approach to reducing spam. Bayesian filters are described in publications such as, for example, “A Plan for Spam” by Paul Graham, published at http://www.paulgraham.com/spam.html, in August of 2002 (also referred to herein as “the Graham article”), which is incorporated herein by reference in its entirety. As known to those skilled in the art, as evidenced by the published articles, Bayesian filters operate by scanning incoming email messages into tokens. The most interesting tokens, where interesting is measured by how far their spam probability is from a neutral value, are used to calculate the probability that the email is spam.
As anti-spam filters adaptively refine their spam-identifying abilities, the spammers invent other ways of defeating these spam filters. For example, Bayesian filters typically scan the subject and body of an email in order to extract the tokens. While the tokens in the subject and body of the email message may be sufficient to identify a vast majority of spam, spammers may circumvent the filters using other techniques. Thus, an ongoing need for better spam identification exists in the industry.