Although electronic mail, or email, has become immensely popular and is a huge benefit for many users, today's email systems are also plagued by increasing numbers of unwanted mail, referred to as “spam.” Spam email has reached such large proportions with respect to desired email that systems are now sought to defeat the sending and delivery of spam.
For example, one approach is to design “filters” to block spam before it is received in a user's email in-box. The filters use different user-designed criteria such as detecting a sender's name, or a word or phrase used in a subject header. Filters can also be used to sort email into separate folders once the email has been received by a user so that the user can ignore folders into which spam is sorted. These approaches are not without shortcomings since the filters typically work on keyword matching or common and relatively easy to detect syntax or language features. Spam emailers have developed ways to thwart simple filter approaches. Sophisticated spam senders can use processes to modify an original email messages into different variations that each communicate essentially the same message. Typically the message is designed to sell something to a recipient, or is designed to provide other commercial advantage to the spam emailer.
For example, one line in an email message might be “buy this now.” The line can be modified to “you should try this now.” Other properties of the message can be modified such as the order of sentences, addition or removal of words or phrases, changes in spacing or other message formatting, etc. Since the modified spam email messages are different, it is difficult for simple spam detection routines to successfully identify a primary characteristic of spam email, namely, that the email is sent in large number such as thousands, hundreds of thousands or more instances of the same message. Such high-volume email is referred to as “bulk” email. Spam emailers can also use such approaches to change other characteristics of an email message, such as sender identification, routing information and other information that may be associated with an email message that could otherwise help determine that the email message is a bulk emailing and is likely to be spam.
Spam detection is further complicated because all bulk emailings are not necessarily spam. For example, if thousands of users desire to be informed of daily weather from a weather source then the messages are likely to be the same or similar, depending on the regional location of the users. Even though such email would qualify as bulk email it would not be considered spam. Still other users may actually desire to receive certain types of commercial email that would be considered spam by other users. Today's email filter and anti-spam systems often fail to provide for such conditions.
Thus, it is desirable to improve detection of bulk and/or spam email.