The Internet is a collection of interconnected computer networks, which has long been employed for data management, communication, purchasing goods, sharing, searching, etc. As the Internet continues to grow, users are becoming increasingly more comfortable in using the Internet of critical, sensitive applications such as banking, shopping, or confidential data. access. To ensure that such applications remain secure, some type of authentication (e.g., using userids or passwords) are often employed. However, such authentication scheme may be vulnerable to phishing attacks by phishers.
Phishing is a type of attack designed to deceive a user into unwittingly surrendering sensitive credentials, such as usernames, passwords, credit card, bank account or social security numbers. Phishers accomplish their phishing attacks by dissimulating themselves as trustworthy and authentic, most commonly through electronic communication, such as e-mail or instant messages.
An example of a conventional phishing technique is link manipulation. Phishers may employ a cleverly designed Uniform Resource Locator (URL) link appearing to belong to a valid we site and when visited actually directs the user to a malicious website where the user may be tricked into entering their personal credentials. Once the phisher has acquired these sensitive credentials, he or she can use these sensitive credentials in many ways, which may cause an immense amount of damage to the victim, such as withdrawing funds out of financial accounts. Therefore, detecting these malicious websites more efficiently and intelligently is a critical task for individuals and organizations alike.
One technique for detecting these malicious websites involves employing a signature-based content filtering solution. Signature-based content filtering solutions utilize tools to identify the URLs associated with the phishing websites. For example, spam traps may be set up to attract as many undesirable e-mails as possible. Once the undesirable e-mails are collected, analysis may be conducted to extract a pattern from the URL, which pattern would serve as a signature of the phishing website for future identification.
Signature-based content filtering is, however, a reactive solution wherein the pattern is not extracted until after the phishing attack has initiated. Unfortunately, because the life cycle of a phishing website is usually very short lived, around 52 hours for example, by the time the pattern is identified and distributed to the customers, the phishing website more than like will already be out of date.
Another solution for detecting these malicious websites employs a heuristic rule-based approach. In a heuristic rule-based approach, instead of extracting the pattern for phishing detection from the URLs, the legitimate website's content is actually analyzed to create a set of heuristic rules. These sets of heuristic rules are then employed to determine whether the suspect website is likely to be a phishing website for the particular legitimate website. However, because the number of websites on the Internet that are targeted by phishers may be quite large, a great deal of effort and time is needed in order to generate and fine-tune the heuristic rules for all the legitimate websites. Further, the legitimate websites themselves are updated from time-to-time, necessitating frequent time-consuming heuristic rules updating. The heuristic rule-based approach also tends to suffer from a large number of false positives.
In view of the foregoing, improved solutions for detecting phishing are desired.