1. Field of the Invention
The present invention relates to reducing or preventing legitimate web site content from triggering matches to anti-phishing black lists.
2. Description of the Related Art
Phishing attacks are computerized ploys that are intended to steal consumers' personal identity data and financial account credentials. Examples of phishing schemes include using ‘spoofed’ e-mails to lead consumers to counterfeit websites designed to trick recipients into divulging financial data such as credit card numbers, account usernames, passwords and social security numbers. Hijacking of brand names of banks, e-retailers and credit card companies, are often used to convince recipients to respond.
Most phishing attacks today are done by the phisher placing the fake website content on a legitimate website. They do this by hacking into the site and placing the content in an area that they create and upload files to. They then send out the URL to the uploaded web content in a fraudulent email. The victim then clicks on the URL in the email which takes them to the area of the website created by the phisher.
As an example, take a legitimate site such as http://www.myfamilyphotos.com. The phisher will hack into the web server hosting www.myfamilyphotos.com and place the illegitimate content, such as fraudulent content in a location such that it can be referenced by a URL such as http://www.myfamilyphotos.com/subdir/hackedpage.asp. To a user, this URL may appear legitimate, as the illegitimate content is in fact actually present on the legitimate website server. Of course, the illegitimate content may be any sort of web application, file or content, not just an .asp page.
Many Anti-Phishing solutions block access to illegitimate sites by providing a black list. This black list includes a list of domains known to have illegitimate content on them. For example, myfamilyphotos.com would be one such domain. Also included in the black list is a list of search patterns or signatures for each listed domain, as there may be more than one illegitimate site on a domain. These search patterns are commonly in regular expression format, but don't have to be. A regular expression is a string that is used to describe or match a set of strings, according to certain syntax rules. To determine if a URL points to illegitimate phishing content, the domain part of the URL is extracted and then the list of search patterns is retrieved from the black list. These search patterns are then evaluated against the full URL. If the full URL matches one or more of the search patterns, the URL is determined to be a phishing site. These black lists are built by automatically and manually processing phishing samples.
A problem arises in trying to ensure that the search patterns do not match a legitimate part of a web site. Such a match is known as a false positive. False positives in phishing black lists are increasingly common as phishing attacks increase in number. False positives are undesirable as they will block access to legitimate web content. In order to reduce false positives, the black lists may be manually reviewed. However, due to the vast number of web sites in existence, black lists tend to be very large. Manual review of such black lists is very time consuming and expensive. A manual review step will also delay the availability of the new black list entry increasing the window of time that the solution does not protect against the illegitimate content.
A need arises for a technique that prevents false positives from occurring by reducing or preventing legitimate web site content from triggering matches to phishing black lists, but which provides time and cost savings over manual review of black lists.