In the field of network communications, more particularly Internet network communications, spam messaging in the form of unsolicited email has become more and more prevalent, targeting both commercial and private consumers. Spamming, generally defined, is the process of sending mass unsolicited messages to network users in the form of e-mail messaging or other text messaging.
There are a variety of known technologies that have rather recently been developed to fight spam messaging, and these are collectively known in the art as spam filtering. Typical prior-art spam filtering techniques may rely on the presence of some common and/or unusual traits in spam messages and may attempt to classify messages as spam messages according to detection and sometimes analysis of those traits.
Arguably the most prevalent existing spam filtering systems may be software applications that use word detection of pre-compiled key words or in some cases phrases that might be known to appear in spam messages. These structured text-based filters may look for keywords or phrases that appear in email headers, subject lines, and message bodies. There are Bayesian filters, statistical filters, white and blacklist filters, and heuristic filters that may perform a number of tests on messages and compare weighted values against a pre-defined weight threshold. Many of these filters may be trained by fine-tuning. For example, manually selecting a message that has escaped filtering and marking that message as spam may cause addition of new parameters to the filter criteria so that in the future similar messages may be detected and identified as spam.
Spam filtering may be performed locally (typically at a users station) by software installed thereon or in many cases as a service at the server-side of a user's connection by server-based software, as may be the case with most Web-based e-mail servers. Often there may be software components at both sides of a communications link. There may also be private and public databases (blacklists) containing identification information of known spam senders. Blacklisting might occur when spam is discovered and may involve listing parameters about the spammer like IP address, company name and address, or URL addresses that may be known to be spam related.
As time goes by spammers become aware of efforts to defeat their purposes, and the spam senders develop now techniques to avoid the tools and processes developed to thwart them. For example, keywords and phrases that might be subject to filtering by text-based parsing and comparison to known words or phrases may be masked using hidden characters that are machine readable but do not appear to a human recipient. Keywords may often be intentionally misspelled as well as rearranged with respect to phrasing. Spammers may also insert characters into message headers and message bodies or into URL strings in an attempt to hide from conventional filtering systems. Filtering for phrases and phrase variations may also be time consuming and process intensive and therefore may not be completely practical for most applications.
Spammers may also use well-known spoofing methods to hijack trusted machines, universal resource locators or domain names of trusted sources, and sometimes set up fraudulent (counterfeit) Web sites for interaction, the Web sites perhaps emulating those sources. Real contact information may often be masked to foil automated location attempts, but may be left intact enough for facilitating a receipt of user monies, or user participation with respect to the goal of the spammer. One thing that may be common to essentially all spam messages is some parameter that directs a recipient's participation, whether it's a postal address for sending money, a URL for directing recipients to a Web site, a telephone number to call, or some combination of the above.
Some state-of-the-art spam filters may remove an impressive percentage of Spam mail before the mail is deposited into a user inbox, up to 90% or more in some cases. However spammers, knowing that a good percentage of their mails might be intercepted before reaching a user, may simply increase the numbers of messages originally sent to insure that the portion that makes it through remain an adequate amount for their purposes. In a given spam campaign, the actual messages themselves may often be altered slightly from message to message so that there appear to be differences among messages in a same batch. In this way spammers may increase their percentage figures of mails that ultimately escape the filtering process.
In some systems known to the inventor and identified in the cross-reference section of this specification, spam filtering may be initiated on email messages before they are stored in a message store for client access. These systems may leverage external information and internal information or evidence for use in weighting to provide trust metrics that are associated with email data such as return path information parsed from email messages to help with a more accurate spam classification. Return path information may also be utilized in one of the mentioned systems to associate multiple email messages that might be related to a bulk email campaign that may be classified as a spam campaign.
One drawback to most email spam-filtering systems may be the likelihood for incorrect classification of some filtered email messages. For example, a percentage of email messages that are spam might be missed by a given filtering system and therefore classified as trusted email by default. Likewise, some trusted email might inadvertently be classified as spam based on incorrect or weak evidence.
Many filtering systems may simply determine a positive or negative classification for spam based on the amount of internal information that can be leveraged in a given time period for classification processing using any of the various filtering techniques. With respect to time, spam-filtering systems may process emails after they are received and stored for access by clients of a store-and-forward email server system for example. In these cases the filtering process may begin only after a client logs in to access email, leaving a small time window for correct classification.
In the case of incorrect classification by a spam filter, a client may have to browse a spam email folder, for example, to determine if any email classified as spam is actually trusted email. Likewise, a user may feel compelled to browse email in an inbox reserved for trusted email for any email that may actually be spam email. Such activity may, depending on volume of messages, be time consuming and wasteful.
Therefore what is clearly needed is a method for reclassifying or validating the original classification of spam-filtered email messages using newly found evidence.