Electronic messages have become an indispensable part of modern communication. Electronic messages such as email or instant messages are popular because they are fast, easy, and have essentially no incremental cost. Unfortunately, these advantages of electronic messages are also exploited by marketers who regularly send out unsolicited junk messages. The junk messages are referred to as “spam”, and spam senders are referred to as “spammers”. Spam messages are a nuisance for users. They clog people's inbox, waste system resources, often promote distasteful subjects, and sometimes sponsor outright scams.
There are a number of commonly used techniques for classifying messages and identifying spam. For example, blacklists are sometimes used for tracking known spammers. The sender address of an incoming message is compared to the addresses in the blacklist. A match indicates that the message is spam and prevents the message from being delivered. Other techniques such as rule matching and content filtering analyze the message and determine the classification of the message according to the analysis. Some systems have multiple categories for message classification. For example, a system may classify a message as one of the following categories: spam, likely to be spam, likely to be good email, and good email, where only good email messages are allowed through and the rest are either further processed or discarded.
Spam-blocking systems sometimes misidentify non-spam messages. For example, a system that performs content filtering may be configured to identify any messages that include certain word patterns, such as “savings on airline tickets” as spam. However, an electronic ticket confirmation message that happens to include such word patterns may be misidentified as spam or possibly spam. Misidentification of good messages is undesirable, since it wastes system resources, and in the worst case scenario, causes good messages to be classified as spam and lost.
It would be useful to have a technique that would more accurately identify non-spam messages. Such a technique would not be effective if spammers could easily alter parts of the spam messages they sent so that the messages would be identified as non-spam. Thus, it would also be desirable if non-spam messages identified by such a technique is not easily spoofed.