1. Field
The subject matter disclosed herein relates to a method and system for detecting image spam.
2. Information
Spam is a type of abuse perpetrated on electronic messaging systems, such as e-mail systems, to indiscriminately send unsolicited bulk messages. Such bulk messages are often sent with the ultimate goal of defrauding unsuspecting users to a spammer's benefit. Because of its low costs and high returns, spamming has become a flourishing sub-legal Internet industry. For instance, according to conservative estimates, spam may comprise 80 to 85% of all e-mail in the world.
Spamming and the fight against it constitute two of the heaviest economic activities on the Internet. Some anti-spam techniques have been developed to counter or limit the number of spam messages that users receive. Anti-spam techniques often analyze text words used in a message and categorize a message as spam based on such text words. For example, the presence of certain words or regular expressions may indicate that a message constitutes spam. For example, if a message advertises a certain product, such as “herbal Viagra,” such a message might be determined to be spam if it is known that spam messages often contain a “herbal Viagra” phrase within text of such a message.
There is currently an “arms race” between spammers and anti-spammers, e.g., the better the anti-spammers get at filtering out particular classes of spam messages, the more sophisticated the spammers become to deny the advantage the anti-spammers have by introducing new classes of spam messages that are, at least for a while, unidentifiable by the anti-spammers.
A new addition to the repertoire of spammers is image spam. Image spam is an obfuscating method in which text of the spam message is stored as an image and is displayed, for example, in e-mail of users. This may prevent current text-based spam filters from detecting and blocking image spam messages. Some early image spam filtering methods have worked by attempting to automatically segregate and recognize the text present in the spam images. However, spammers have circumvented such early filters by introducing noise of various forms into the spam image. Added noise tends to interfere heavily with automated character recognition engines in image spam filters, leading to many noisy image spam messages passing through image spam filters to end users.