The growth in the use of the Internet over the past several years has produced a corresponding growth in the number and types of illegitimate practices undertaken over the Internet. From the annoying, but relatively innocuous, invasion of spam email to more insidious practices such as identity theft (including without limitation, phishing, pharming and the like), online fraud, sales of counterfeit and/or unauthorized goods (including via the many reputable online auction sites) trademark misuse, and the like, the Internet has provided numerous opportunities for enterprising fraudsters and con artists to exploit the unwary.
A variety of solutions have been proposed to deal with various types of illegitimate online practices. Merely by way of example, various systems for identifying and responding to online fraud are described in detail in the following commonly-owned, co-pending applications, each of which is hereby incorporated by reference, and which are referred to collectively herein as the “Anti-Fraud Applications”: U.S. patent application Ser. No. 10/709,938 (filed by Shraim et al. on May 2, 2004); and U.S. patent application Ser. Nos. 10/996,566, 10/996,567, 10/996,568, 10/996,646, 10/996,990, 10/996,991, 10/996,993, and 10/997,626 (all filed by Shraim, Shull, et al. on Nov. 23, 2004). As another example, systems for identifying, and/or establishing the trustworthiness of, various online entities are described in detail in the following commonly-owned, co-pending applications, each of which is hereby incorporated by reference, and which are referred to collectively herein as the “Trust Applications”: U.S. patent application Ser. Nos. 11/368,255, 11/368,329 and 11/368,372 and (all filed on Mar. 2, 2006 by Shull et al.) and U.S. patent application Ser. No. 11/339,985, filed Jan. 25, 2006 by Shull et al.
Such systems often seek to identify illegitimate online practices through the analysis of email messages (including without limitation spam messages and/or phish messages), web sites and other data sources. For instance, the Anti-Fraud Applications describe systems that can analyze text in an email message and/or on a web site, and based at least in part on that analysis, determine whether the email message and/or web site is part of an online scam (e.g., an attempted fraud, identity theft operation, trademark misuse, sale of counterfeit goods, etc.). In an attempt to avoid detection, however, many scammers have begun using images to convey information to targets of their scams.
Merely by way of example, knowing that many email clients and/or servers have spam filters that analyze email text for “toxic” terms (such as common pharmaceuticals, promises of bodily enhancement, etc.), some spammers have begun sending messages that begin with an image (such as a GIF or JPEG image) comprising an advertisement, and include seemingly-innocuous text (such as an apparent message from one friend to another) at the bottom of the message. The email system's spam filter generally is unable to analyze the image, and the text of the message includes nothing that would trigger the spam filter. Consequently, the spam filter allows the message into the user's inbox. When read by the user, however, the message is clearly a spam.
As another example, a seller offering counterfeit products (or genuine products without authorization) often will misuse the logo and/or other trademarks of a reputable brand in order to entice buyers. Such offers are commonplace on a variety of web sites (including in particular auction sites such as eBay®). Often, the misuse of a trademark will occur in an image displayed on the web page. Accordingly, automated tools designed to police trademark infringement and/or counterfeit sales, which generally search for a textual representation of the trademark, brand name, etc., will be unable to detect the misuse.
Optical character recognition (“OCR”) tools have been developed to analyze an image for any text contained therein. In many cases, however, those tools require relatively high resolution (e.g., 300 dots per inch (“DPI”) or better) images to resolve text in common (e.g., 12 point) font sizes. In online use (such as email and the web), however, much lower resolutions (e.g., 72 DPI) are common. Further, most OCR tools perform optimally only when analyzing high-contrast (e.g., black text on a white background) images. In many cases, however, online are much lower in contrast and/or comprise a variety of colors, rendering useless many OCR tools. In addition, even in situations where an OCR tool might be able to resolve text from an online image, traditional OCR tools are computationally expensive. In a high volume application (such as analyzing email messages and/or web sites in bulk), these tools render unacceptable performance.
Accordingly, there is a need for solutions that allow the analysis of online images (e.g., images in email messages and/or on web pages) to determine whether a particular image is part of an effort to undertake illegitimate online activities.