The problem of spam in established communication technologies, such as electronic mail, is well-recognized. Spam may include unsolicited messages sent by a computer over a network to a large number of recipients. Spam includes unsolicited commercial messages, but spam has come to be understood more broadly to additionally include unsolicited messages sent to a large number of recipients, and/or to a targeted user or targeted domain, for malicious, disruptive, or abusive purposes, regardless of commercial content. For example, a spammer might send messages in bulk to a particular domain to exhaust its resources.
One type of spam message includes image spam. Image spam employs a technique in which the sender (typically a spammer) might include the spam as part of an embedded file attachment rather than in the body of the message. Image spam may include an image file, such as a GIF file, or the like, typically a quantity of random words, sometimes known as word salad, and maybe even a link to a website. An image spammer may use a combination and/or variation of these components to bypass traditional anti-spam technologies.
These images are often automatically displayed to a recipient of the message. Unfortunately, much of such image spam remains hidden or undetected as spam from today's spam filters. The increase in more complex image spam within messages has caused spam capture rates across the messaging security industry to decline, often resulting in wasted productivity and end-user frustration as more spam gets delivered.
FIG. 1 illustrated examples of typical image spam 102-103. To an end-user recipient, the content of a message, image spam 102-103 might appear as a text-based message. Many spammers may use such image spam with links (e.g., URL links) embedded within the message or directly in the image spam, such as illustrated within image spam 103. One of the goals of the spammer is to have an end-user ‘click’ on the link, so that the end-user may be directed to a website that may be trying to sell something, phish for personal information, or even install spyware, mal-ware, or the like, on the end-user's computer.
Moreover, images can be gathered from remote locations using, for example, HTML IMG tags to display images loaded from a website when a message is viewed. Other variations of image spam may have embedded images that direct end-users to enter a URL address into a browser. Where there is no communication with any external source, this type of spam may evade URL block list type filters.
In addition, spammers often automatically generate image spam that may include virtually the same text, but appear as completely different images. For example, the spammers might change dimensions, spacing, or coloring of an image so that the image appears unique to traditional spam analysis. Spammers may employ a variety of image generation tools, for example, to randomize such characteristics while keeping substantially the same text.
To further confuse many of the traditional spam filters today, spammers may insert random characters and speckles, and even reuse an image to create a large number of slightly different images. Speckling allows the spammers, for example, to reuse a base image and add what looks like random bits of lint or speckles to the image, which often may appear to filters as unique images effectively evading fingerprinting, or other detection approaches. Another kind of image spam technique uses several colors making the text more difficult to recognize when using, for example, optical character recognition (OCR) techniques. Varying font colors may further hide spam type of words within an image. Recently, animated images and strip mining are techniques being used by image spammers to further evade traditional spam filters. Animated images with transparent frames are even sometimes used to build up spam images. Sometimes, an image spammer may even build an image spam from a plurality of distinct layers that may effectively evade traditional spam detection filters.