The emergence of electronic mail, or e-mail, has changed the face of modern communication. Today, millions of people every day use e-mail to communicate instantaneously across the world and over international and cultural boundaries. The Nielsen polling group estimates that the United States alone boasts 183 million e-mail users out of a total population of 280 million. The use of e-mail, however, has not come without its drawbacks.
Almost as soon as e-mail technology emerged, so did unsolicited e-mail, also known as spam. Unsolicited e-mail typically comprises an e-mail message that advertises or attempts to sell items to recipients who have not asked to receive the e-mail. Most spam is commercial advertising for products, pornographic web sites, get-rich-quick schemes, or quasi-legal services. Spam costs the sender very little to send—most of the costs are paid for by the recipient or the carriers rather than by the sender. Reminiscent of excessive mass solicitations via postal services, facsimile transmissions, and telephone calls, an e-mail recipient may receive hundreds of unsolicited e-mails over a short period of time.
On average, Americans receive 155 unsolicited messages in their personal or work e-mail accounts each week with 20 percent of e-mail users receiving 200 or more. This results in a net loss of time, as workers must open and delete spam e-mails. Similar to the task of handling “junk” postal mail and faxes, an e-mail recipient must laboriously sift through his or her incoming mail simply to sort out the unsolicited spam e-mail from legitimate e-mails. As such, unsolicited e-mail is no longer a mere annoyance—its elimination is one of the biggest challenges facing businesses and their information technology infrastructure. Technology, education and legislation have all taken roles in the fight against spam.
Presently, a variety of methods exist for detecting, labeling and removing spam. Vendors of electronic mail servers, as well as many third-party vendors, offer spam-blocking software to detect, label and sometimes automatically remove spam. The following U.S. patents, which disclose methods for detecting and eliminating spam, are hereby incorporated by reference in their entirety: U.S. Pat. No. 5,999,932 entitled “System and Method for Filtering Unsolicited Electronic Mail Messages Using Data Matching and Heuristic Processing,” U.S. Pat. No. 6,023,723 entitled “Method and System for Filtering Unwanted Junk E-Mail Utilizing a Plurality of Filtering Mechanisms,” U.S. Pat. No. 6,029,164 entitled “Method and Apparatus for Organizing and Accessing Electronic Mail Messages Using Labels and Full Text and Label Indexing,” U.S. Pat. No. 6,092,101 entitled “Method for Filtering Mail Messages for a Plurality of Client Computers Connected to a Mail Service System,” U.S. Pat. No. 6,161,130 entitled “Technique Which Utilizes a Probabilistic Classifier to Detect Junk E-Mail by Automatically Updating A Training and Re-Training the Classifier Based on the Updated Training List,” U.S. Pat. No. 6,167,434 entitled “Computer Code for Removing Junk E-Mail Messages,” U.S. Pat. No. 6,199,102 entitled “Method and System for Filtering Electronic Messages,” U.S. Pat. No. 6,249,805 entitled “Method and System for Filtering Unauthorized Electronic Mail Messages,” U.S. Pat. No. 6,266,692 entitled “Method for Blocking All Unwanted E-Mail (Spam) Using a Header-Based Password,” U.S. Pat. No. 6,324,569 entitled “Self-Removing E-mail Verified or Designated as Such by a Message Distributor for the Convenience of a Recipient,” U.S. Pat. No. 6,330,590 entitled “Preventing Delivery of Unwanted Bulk E-Mail,” U.S. Pat. No. 6,421,709 entitled “E-Mail Filter and Method Thereof,” U.S. Pat. No. 6,484,197 entitled “Filtering Incoming E-Mail,” U.S. Pat. No. 6,487,586 entitled “Self-Removing E-mail Verified or Designated as Such by a Message Distributor for the Convenience of a Recipient,” U.S. Pat. No. 6,493,007 entitled “Method and Device for Removing Junk E-Mail Messages,” and U.S. Pat. No. 6,654,787 entitled “Method and Apparatus for Filtering E-Mail.”
One known method for eliminating spam employs similarity detection. In one typical implementation of similarity-based methods, a large number of “decoy” or “honey pot” e-mail accounts associated with fictitious users are deployed, and the e-mail addresses are publicized to attract spammers. Any e-mails that are received by these e-mail accounts are deemed automatically to be, by definition, unsolicited e-mails, or spam. These spam e-mails are aggregated into a spam e-mail corpus. Alternatively, the spam e-mail corpus can be formed by aggregating e-mails that users have voted as spam. A similarity detection method examines incoming e-mail, comparing it with each spam e-mail in the corpus. If there is a sufficient degree of match with one or more e-mails in the spam corpus, the e-mail is deemed to be spam and dealt with accordingly; otherwise the e-mail is not deemed to be spam, and is treated normally.
Unfortunately, spammers frequently invent new twists designed to circumvent commonly used similarity detectors, including adding, deleting, or modifying content of e-mails to make them superficially different. This forces the authors of similarity-based filters to respond in kind with enhancements designed to capture the underlying similarity of the spammer's e-mail messages, and the arms race cycle begins anew.
Other known methods for eliminating spam include rule-based methods based on information in the e-mail header and body, of which whitelists and blacklists are a simple example. Other known methods include Bayesian classifiers, as well as other statistical methods based on support vector machines and decision trees. However, just as is the case for similarity-based detection methods, spammers can usually find ways to elude any of these techniques, at least temporarily until the anti-spam methods can adapt to the new innovations of the spammers. This introduces a time window during which users can be inundated with spam e-mail. Since different spammers are continually finding innovative techniques that temporarily weaken the effectiveness of anti-spam filtration techniques, users can receive an unacceptably high amount of spam in their inboxes.
In short, there is no one anti-spam technique that can long withstand determined attack by spammers, resulting in a higher overall rate of spam. Therefore, a need exists to overcome the problems with the prior art as discussed above, and particularly for a way to improve both the effectiveness of spam filtration and the robustness of spam filtration against continued innovation by spammers.