The present application relates to technologies for providing Internet security, and in particular, to methods for detecting phishing websites. The disclosed methods utilize common characters of phishing websites and public resources of Internet.
Phishing is a network crime in which users are lured to visit a website which is very similar with a target website that the users intended to visit. The website then obtains the users' personal information that is inputted at the website. Because of the popularity and development of electronic commerce and Internet applications, phishing has caused increasingly serious losses to Internet users. Phishing fraud has become the biggest threat to Internet security, according to “Chinese Network Security Report in the first half of 2011” issued by 360 Safe™, the largest security company in China. The number of phishing attacks has increased significantly in recent years, as reported by International Anti-phishing Alliance. It has become particularly urgent to find effective phishing detection methods.
Phishing lures users to visit a website which is very similar with the target website that the users intend to visit, and then obtains users' personal information inputted in the website. As a network crime, phishing is similar to vagabond crimes in the real world: after a phishing website is set up, it may take only a few days or even a few hours before it disappears. Because of their short lives, phishing websites are rarely indexed and evaluated by internet resources such as search engine, ranking service, etc.
In another aspect, the nature of phishing determines that phishing websites need to decoy as a target websites; a phishing website needs to look very similar to the target website to misguide users as the genuine target in order to obtain illegal benefits. The primary similarity is in their web pages. To match the web content of the target websites, phishing detection need to collect web content from all target websites, which is a complex and endless job as new target websites continue to appear. On the other hand, the phishing decoy can be mainly in the similarity of their titles to the titles of the target websites. This type of similarity can be assessed by comparisons using public search engines, which saves the work of collecting content from target websites.
The current phishing detection field includes three main detection methods: blacklist detection technique, URL based on detection, and web-page-content based detection. The blacklist detection technique maintains and constantly updates a list of phishing sites through user evaluations or reports, to prevent additional users to visit phishing websites that have already been discovered. URL based on detection analyzes the structure and elements of the URL. This detection technique also uses registration and analysis information to determine whether a website is a phishing website. The URL based on detection is often used as a preliminary detection, while the final determination is usually based on web content. Finally, web-page-content based detection analyzes and determines similarity between a target web page and web pages at the potential phishing website.
Among the three above described detection technologies, the biggest drawback for the blacklist detection technique is in its time lag. The URL based on detection needs prior collection of phishing website's URL, and is incapable of detecting new phishing targets. Moreover, the web-page-content based detection requires prior knowledge of the target web-page and needs to collect a lot of phishing samples. This method is also incapable of thwarting phishing attacks against new targets.
In view of the above, there is a need for more accurate and more effective methods for detecting phishing fraud on the Internet.