Due to rapid development of computer technologies and networks, the problem of detecting phishing services and resources, such as web web pages, Java applets, Flash applications, etc., along with their constituent parts (e.g. scripts, executable code and parts thereof, images and their parts, etc.), as well as phishing services, including software and applications for computing devices, including, but not limited to, mobile devices (cell phones, smartphones, tablets, laptops), PCs, servers, data storages, including, but not limited to, net-based data storages, such as SAN, NAS, etc. systems, LAN switches, Wi-Fi routers, hubs, etc., and other devices capable of processing and/or transferring data, is becoming increasingly important.
Phishing is a special form of Internet based fraud, particularly, the attempt to obtain sensitive data of the users, such as their logins, passwords, credit card details (PINs, expiry dates, CVV2/CVC2 codes, numbers, card holder names), phone numbers, code words, confirmation codes (e.g. for money transfers or payments), etc.
Phishing web resources are fake resources, i.e. fake web sites or pages, etc., that mimic the appearance of a known/original resource (a targeted web resource, i.e. a resource that may be targeted by various phishing activities). For instance, phishing (fake) web pages (Internet pages) may imitate the appearance of an original web page of a bank, an e-payment system, or a login page, etc., especially one that requires the user to input confidential or sensitive data, or some other information that is valuable for the user (their clients, friends, relatives, etc.) and/or offenders. These phishing web pages (or phishing web resources, in general) are created by the offenders (fraudsters) to obtain sensitive data of web site users/visitors.
The data collected through phishing (phishing attacks) may be then used by the offenders, e.g. in order to steal money from a credit card (specifically, through illegal withdrawal), or to extort money for restoring stolen logins and passwords (fraudsters change either part, so that the user is unable to access some web resource), or to extort money for keeping some valuable information secret, etc.
Links (web links, URLs, URIs, etc.) to phishing resources may be sent in text messages, a practice that is also known as “SMiShing”. Also, links to phishing web resources may be contained in mailing lists, on various web sites, including social network pages, or in computer programs, such as office applications, including mobile (Android, iOS, etc.) applications.
Currently, links, particularly, web links (or any other data and information that can be analyzed for phishing) to be checked for possible phishing attacks may be obtained in different ways, e.g.:                through analysis of mail traffic (e.g. complete links in message body);        through analysis of web traffic using various services, software, etc. that are capable of registering URL addresses (web links), where Internet HTTP/HTTPS queries are directed;        through analysis of SMS/MMS traffic or messenger traffic. For instance, fraudsters can send messages telling the user to pay for fake services. For instance, such phishing offer may be contained in a message (or a link to that message from other devices, e.g. received from contacts that are known or unknown to the user) telling the user that their account has been blocked and that they have to follow the web link provided in the message (SMS/MMS). Or        through analysis of domain names (e.g. addresses of web resources, web addresses, web sites, URIs, URLs, etc.) that are similar to popular web resources, brands, etc., so that users may mistake a fake resource for the real one. For instance, the original domain name “original-bankname.ru” may be substituted with a fake one, like original-bankname.tk, original-bankname.fake-domainname.es, original-banknameonline.pe.hu, original-bankname.online.hol.es, etc. Otherwise, veiled links may be used, including links to similar web resources.        
The methods described above allow to detect, identify and/or somehow determine phishing resources directed against various brands, companies, web resources, etc. A possible difficulty, that arises when trying to detect a phishing resource or a phishing attack, is that the capability to view any type of traffic (e.g. HTTP traffic, email traffic, etc., i.e. information/data that has been sent and/or received) is limited. This limitation is usually caused by lack of access to the traffic. For instance, when there is lack of access to mobile traffic (data transferred over mobile/cellular networks), the phishing analysis system won't be able to see (or will only partially see) the phishing links, including web links, that are distributed over a mobile network. Also, traffic from some countries may be unaccessible. For instance, it is possible to analyze Russian traffic from Russia, but not U.S. traffic, therefore, it is difficult to detect phishing against U.S. brands.
Moreover, the methods described above may only detect phishing web resources, particularly, web sites, that compromise many various brands, but not phishing web resources directed against a specific brand. For example, it is necessary to detect/identify phishing activities against Brand 1. In this case, phishing detection system has to detect phishing activities against the given brand, particularly, web resources of that brand. The phishing detection system is capable of analyzing web traffic, SMS traffic, email traffic for Brand 2, Brand 3, Brand 4, etc., but this analysis is usually unable to detect phishing for Brand 1, or it may detect only traces of phishing activities, because of limitations described above.
In other words, when there is a need to detect phishing for Brand 1, various link sources (described above) may be analyzed, phishing resources for various brands may be detected, e.g. for Brand 2, Brand 3, Brand 4, etc., but not for Brand 1.
To facilitate the creation of phishing pages (particularly, web pages), Internet-based fraudsters use so-called phishing kits, i.e. a ready-made set of pages, scripts, configuration files, etc. that are customizable depending on what the offender intends to do with the information they obtain. Such fishing kits, like phishing web pages, may be created in such a way (and contain certain information/data), so that they allow loading web elements of web resources (e.g. phishing web resources, phishing scripts, phishing files, etc.) from official web resources (of e.g. payment systems, banks, terminals, network devices, etc.), such as images that may contain graphic elements, such as official logos, icons, animated elements (e.g. GIF animations, Flash animations, scripted or coded animations, etc.), scripts, script parts, animations, executable code or parts thereof, etc.
There are conventional methods and technologies for detecting phishing web resources by analyzing URL addresses through URL masks (the URL being Uniform Resource Locator/Universal Resource Locator), or by analyzing domain names with key words, or by checking whether there are contents that are loaded from official web sites, or by checking whether there are images characteristic for a certain brand, and including the web resource reputation. Such methods and technologies for countering phishing, Internet-based fraud and illegal access to sensitive information of users (visitors of web pages or users of applications, including mobile applications), and, particularly, methods and technologies for detecting phishing web pages may further comprise determining of domain name registration date (as well as its expiration date), or calculating hash values of web pages and comparing them to hash values that have been calculated earlier. Hash value (hash code, or simply hash) is a result of processing data with a hash function. A hash function is a function for translating an input array into a bit string of a fixed length using a given algorithm.
In order to evade detection by the methods described above, Internet-based fraudsters perform the following:                they place their phishing web pages on compromises web sites with good reputation and domain history, so that they are able to sidestep the phishing detection methods that are based on checking the web resource reputation and history;        they create URI paths (URI—Uniform Resource Identifier/Universal Resource Identifier) to phishing pages that do not mention the brand, or the name of the company/system, which allows them to sidestep the phishing detection methods that are based on the analysis of URL addresses by masks;        they create phishing web pages with dynamic contents, so that these cannot be detected by calculating and comparing hash values.        
Therefore, based on the analysis of the related art and technical capabilities, there is a need in the field for a method and system for detecting phishing resources in order to prevent phishing attacks.