The advent of global communications networks such as the Internet has presented commercial opportunities for reaching vast numbers of potential customers. Electronic messaging, and particularly electronic mail (“email”), is becoming increasingly pervasive as a means for disseminating unwanted advertisements and promotions (also denoted as “spam”) to network users.
The Radicati Group, Inc., a consulting and market research firm, estimates that as of August 2002, two billion junk e-mail messages are sent each day—this number is expected to triple every two years. Individuals and entities (e.g., businesses, government agencies) are becoming increasingly inconvenienced and oftentimes offended by junk messages. As such, spam is now or soon will become a major threat to trustworthy computing.
Common techniques utilized to thwart spam involve the employment of filtering systems/methodologies. One proven filtering technique is based upon a machine learning approach. Machine learning filters assign to an incoming message a probability that the message is spam. In this approach, features typically are extracted from two classes of example messages (e.g., spam and non-spam messages), and a learning filter is applied to discriminate probabilistically between the two classes. Since many message features are related to content (e.g., whole words and phrases in the subject and/or body of the message), such types of filters are commonly referred to as “content-based filters”. These types of machine learning filters usually employ exact match techniques in order to detect and distinguish spam messages from good messages.
Unfortunately, spammers constantly are finding ways around conventional spam filters including those that employ machine learning systems. For example, they may utilize mathematical processing and sequential email modification to test and predict spam filter performance. In addition, much information is available to the public which explains how common spam filters operate. Some internet services even offer to run messages through specific filters and to return the respective verdicts of those filters. Thus, spammers have the opportunity to run their spam through various known spam filters and/or modify their messages until they successfully pass through the filter. In view of the foregoing, such conventional filters provide limited protection against spam.