1. Field of the Invention
This invention relates to electronic message analysis and filtering. More particularly, the invention relates to a system and method for improving a spam filtering feature set.
2. Description of the Related Art
“Spam” is commonly defined as unsolicited bulk e-mail, i.e., email that was not requested (unsolicited) and sent to multiple recipients (bulk). Although spam has been in existence for quite some time, the amount of spam transmitted over the Internet and corporate local area networks (LANs) has increased significantly in recent years. In addition, the techniques used by “spammers” (those who generate spam) have become more advanced in order to circumvent existing spam filtering products.
Spam represents more than a nuisance to corporate America. Significant costs are associated with spam including, for example, lost productivity and the additional hardware, software, and personnel required to combat the problem. In addition, many users are bothered by spam because it interferes with the amount of time they spend reading legitimate e-mail. Moreover, because spammers send spam indiscriminately, pornographic messages may show up in e-mail inboxes of workplaces and children—the latter being a crime in some jurisdictions.
Spam filters attempt to remove spam without removing valid e-mail messages from incoming traffic. For example, spam filters scan email message headers, metatag data, and/or the body of messages for words that are predominantly be used in spam, such as “Viagra” or “Enlargement.” Current email filters may also search for images which are known to be used in spam messages. Hashing algorithms such as MD5 are used to generate image “fingerprints” which uniquely identify known spam images.
Over the years, spammers have become more creative in disguising their messages, e-mails, or advertisements as legitimate incoming traffic to avoid detection by spam filters. Specifically, spammers typically obfuscate words which would normally be identified by spam filters. For example, “Viagra” may be spelled “V!agra” or “Enlargement” may be spelled “En!@rgement.” With respect to images, spammers often embed random data within spam images to modify the image fingerprint, and thereby avoid detection.
Thus, improved mechanisms for detecting obfuscated images within email messages are needed.