With the development of network technologies, people are more and more concerned about network security, especially protection of users' private information, such as, but not limited to, users' accounts, usernames, credit card numbers, and passwords. Phishing websites attempt to use fraudulent web pages to defraud a user into disclosing private information, such as accounts, usernames, passwords, social security numbers, answers to security questions and the like. Therefore, how to detect a potential phishing webpage is vital to the network safety.
Currently, it is a common approach to install client security software in a user's system in order to detect the websites that the user visits. For example, the websites are first filtered by “whitelists”, i.e. those approved or authorized websites. The websites which are not listed in the whitelists will then be sent to a server which checks the websites against the blacklist and whitelist and returns the results to security software. The client security software then determines whether to block the webpages based on the returned results. Those webpages that are neither in the blacklist nor in the whitelist are called “unknown webpages”, which are downloaded by the server to check whether the webpages are phishing websites by detecting whether there are certain keywords, such as “XX login,” “sign in” and/or “password” and the like, or whether there is a specific input box for account or password on the webpage. If the above-mentioned keywords are detected, the server then checks whether the website is certified or trustworthy to see if it is a phishing website.
However, there are many problems with the current approach because many phishing websites modify their webpage contents by replacing those keywords, such as “XX login”, “sign in” and/or “password” and the like, with images, and thus can bypass detection of the text contents. In addition, many phishing websites display login interface by flash, which can also bypass the detection of those login keywords. Accordingly, it would be advantageous to provide a method to detect those phishing websites that cannot be detected by common approaches.