Credential stealing is a problem that exists due to the increasing use of webpages that require a user to provide sensitive or confidential information in order to access products or services. An attacker may exploit this use of webpages in order to steal a user's credentials, including sensitive or confidential user information, by creating look-alike pages that match or look similar to an original legitimate web page associated with a brand or trustworthy entity. For example, a hacker seeking to acquire or steal a user's credentials may create a fake or impostor page that matches or is similar to a Sign-in, Sign-up, Password, or Recovery page of a known brand such as Google, Yahoo, or Microsoft. The hacker may then send an email or instant message to a user that includes a link to this fake page. When the user reads the email or message and selects the link, the fake page is displayed for the user. A user that fails to notice discrepancies in the URL or security certificate of the page may trust that the fake page is legitimate and may proceed to enter confidential information onto the fake page resulting in real time transfer of the user's sensitive information to the attackers.
The credential stealing attacks as described above are designed to exploit the vulnerability of the human brain that gives much higher weight to visual design and language written on a web page than to reliable indicators that may validate the identification of the page such as, for example, the URL or security certificate. A user opening a web page will typically focus on the visual appearance of the page and then on the language written on that page, and will often ignore the URL, Domain and Certificate information of a web page that will usually be visible at top of a browser window. The visual appearance and text on the page are vital to a user's perception and understanding of the origin and purpose of a particular web page. The downside is that this natural human tendency to rely on visual and textual similarities for identifying legitimate web pages associated with known brands may be exploited by a hacker or an attacker who creates a visual replica or fake page of the legitimate web page to gain a user's trust. That is, many users will assume the replica or fake page to be a legitimate web page due to visual and textual similarities associated with known brands and will not hesitate to enter their confidential information onto the replica or fake page.
There are two categories of credential stealing attacks considered in this application: (1) brand-based credential stealing; and (2) custom credential stealing. In the case of brand-based credential stealing, an attacker creates an exact replica (at least in terms of visual and textual content) of a brand page that requests for identical information required by the original legitimate page associated with the brand. In contrast, in the case of custom credential stealing, the fake page created by attackers is not an exact replica in terms of visual and textual content of a known brand page but uses certain elements of known brands such as brand logos, brand names and other brand elements to make victims believe that the page belongs to the trusted brand. An advantage of the custom credential stealing page is that an attacker can use multiple brand names on a single page to snatch or acquire a variety of information. Additionally, these pages may also ask for information that is usually not required by the original legitimate brand pages. For instance, a Banking Sign-in page does not typically request a user's social security or ATM PIN number, but a custom credential stealing page with a bank logo may have a web form asking for all of this information.
Accordingly, it would be desirable to provide a method and system that can automatically analyze a web page to detect both brand-based and custom credential stealing attacks in order to address this specific technical problem related to the use of replica or fake webpages to steal sensitive or confidential user information.