Phishing is the attempt to obtain confidential information from users, such as user names, passwords, account numbers and the like, by pretending to be a legitimate online entity. One form of phishing is website forgery. A site forger duplicates or “spoofs” a target website, typically by copying target website's home page, and typically hosts that page in another domain with a very similar name. A forged website is also known as a spoofed or phished website. For example, a target site for a social network may be hosted at “www.thesocialnetwork.com”, with a home page that includes fields for users to login into their accounts by providing their user name and password. A site forger would copy the code home page of the bank and host it, for example at “www.the-social-network.com”. A user may be directed to the forged website from hyperlinks found in emails, blogs, and so forth. A user clicking on the hyperlink may not realize they are being directed to the forged site, and thereby inadvertently provide their user credentials (e.g., username and account password) to the site forger. With the users' credentials, the site forger than can access these users' accounts on the target website, for example to send advertisements, steal personal information, or undertake other malicious activities.
Existing methods for dealing with website forgery generally rely on the monitoring incoming requests for images and other resources, by reviewing the web server log files for such requests. When a website forger copies the home page of the target site, the images in that home page remain sourced at the target site. Thus, when the forged site is loaded by the user's browser, it requests those images from the target site. The log files indicate the domain from which these images are being requested. Since the images can be requested from other domains for legitimate reasons, a forged site is identified by including a web beacon image, e.g., an image of a single pixel, in the home page of the target site. Only a site forger is likely to copy the web beacon image (as a result of copying the entirety of the home page). Thus requests from other domains for the web beacon image may be indicative of a forged site. This method of identifying a forged site is easily defeated by the site forger however, by simply copying and hosting the entire target page, including all of the image files on their own server. In this situation there is no evidence in the target site's log files of the requests from the forged site.
This and other methods of identifying forged sites all depend on identifying a pattern of authorized accesses to the target site. By the time the forged site is identified, significant damage may have already been inflicted by the site forger on the users' accounts and the target site. In particular, existing methods are incapable of identifying and preventing the first time a forged site attempts an attack on a target site.