1. Field of the Invention
The present invention is related to methods of protection against malware located on web resources and, in particular, to malware scans of web resources and identification of malware components on web resources.
2. Description of the Related Art
Detection of viruses and malware has been a concern throughout the era of the personal computer. With the growth of communication networks such as the Internet and increasing interchange of data, including the rapid growth in the use of e-mail for communications, the infection of computers and networks through communications or file exchanges is an increasingly significant consideration. Infections take various forms, but are typically related to computer viruses, Trojan programs, or other forms of malicious code (i.e., malware).
Recent incidents of e-mail mediated virus attacks have been dramatic both for the speed of propagation and for the extent of damage, with Internet service providers (ISPs) and companies suffering service problems and a loss of e-mail and networking capability. In many instances, attempts to adequately prevent file exchange or e-mail mediated infections significantly inconvenience computer users.
Popularization of web services increases malware-related threats to web clients. With the development of web-technologies, such as AJAX, JAVA, PHP, FLASH, etc., web sites become more accessible, with more functionality with various media content, which enhances their appeal and popularity among users. With the introduction of methods of designing web-systems, known as Web 2.0, users have been enabled to not only receive information from the Internet, but also fill out web sites with their own data.
Many Internet users have their own blogs, pages within social networks, where they can share messages or data with other users. In addition to text messaging, modern technologies support transfer of media files, such as photos, video files, interactive documents, animations and applications.
Active participation of users in creation and modification of a web site leads to a rapid development and changes of the site content. A typical example of such a site is a news ticker, where new articles with links to news reports, photos or videos appear at intervals of a minute and sometimes even more often. Another example is a web forum, where a number of users can be in excess of several thousand, and where new messages appear every second.
Increasing popularity of web resources makes this environment more attractive to hackers and virus writers, who spread malicious programs over the Internet. A malicious script, such as iframe, exploits, etc. can be added to the files uploaded to the site. Thus, file exchanges, forums, blogs, web-interfaces of mail servers and any other resources can be infected.
The infection can be perpetrated through any interface, such as a usual web site form to be filled out by a user (in this case, the infection is performed mostly manually by an insider or an intruder who stole passwords). A perpetrator can exploit vulnerability in the content management system or file access the site via FTP. The list of resources is not limited to the http transfer protocol but also includes ftp-resources and other servers.
Typically, ftp-servers are used as tools for remote administration of the sites, including editing and uploading scripts. However, a connection to the ftp-server provides the ability to test all scenarios of site, and not just those that are executed on the user's computer. Access the site files via FTP allows a security system to analyze the source scripts and original pages, while access via http allows to examining only the result of the script and/or the result of processing of the web site files by a web server.
An unauthorized access to a user account and uploading on behalf of this user, a malicious program could lead to a rapid spread of malware because of the credibility of that person among other users. When dealing with mundane sites, users' attention may fade over time due to the usual circumstances. Because of this, incidents of exploitations of social networks and phishing are increasing.
In addition to public Internet services, there are some local sites (corporate, local area networks, user group sites, etc.) that are not accessible from outside of their network. This imposes significant problems in malware scanning of a given resource by anti-virus services or laboratories.
A typical scheme of interaction between a user computer system and a Web resource is illustrated in FIG. 1. Web-server is an application that performs the functions of the server (i.e., a computer system) on which the application is implemented. In addition to web-servers 110 or ftp servers 120, other server applications can be installed. For example, mail servers, proxy servers, IRC-server, etc. In order to interface with these servers, client (i.e., a user) computers 140 must have special applications installed, such as, a browser 130, a file manager 150, etc.
Client 140 access the web server 100 using a URL address of a desired web page or other resource. Each server application and a client interact, using different protocols. The primary protocols for data transfer between the client 140 and server 100 are HTTP (HTTPS), (S) FTP, POPS, SMTP, IMAP4, etc.
Most servers support authentication in the following manner: an authorization data is transferred from the client 140 to the server 100, where the authorization takes place. Then, the data is transmitted from the server 100 to the client 140 based on the rights granted to the client 140.
Personalization makes it possible to make a web resource unique to each user. Authorization is used to distinguish users, their rights and available data. User authorization is carried out by the user via a client application, for example, by filling the forms in the browser. Very often these applications allow preserving the identity of the user. Thus, each user can customize his own interface, to restrict access to his personal page or email, to identify the displayed pages by sections or topics.
In order to provide security while using Internet or on the local network, downloadable content needs to be checked. A downloaded page may contain viruses, trojans, adware, spam, exploits (i.e., HTML code, links to image or “.pdf” file, which has special modifications that result in browser errors and execution of some potentially harmful code) for applications (such as for example, PDF-reader, web browser, media player, flash-player, etc.)
A system of protection of personal computers can include a file anti-virus (AV), network screen, a firewall, a special protection against network attacks, Web anti-virus as well as remote security means of AV vendor companies. Currently AV technologies are heavily developed and includes many different methods and systems that implement both heuristic and signature analysis.
The signature type web analysis includes:
assembling black list of pages (URL-blacklist);
assembling white lists of trusted (i.e., clean) applications/components; and
storing a collection of malware components.
The heuristic analysis typically includes any of:
emulation of executable programs/components;
emulation of executable scripts;
virtualization of execution environment; and control and analysis of application activity.
A firewall or a network screen is necessary for monitoring and filtering network packets at different levels of the OSI model in accordance with assigned rules for network connections: filtering based on static rules and filtering with tracking executed applications and controlling logic and algorithms of the relevant protocols.
A system for protection against network attacks is typically launched at system startup and monitors incoming traffic activity for patterns typical of network attacks. If an attempt to attack a computer is detected, the system blocks any network activity for the attacker to a protected computer.
A conventional web anti-virus (web AV) intercepts and blocks execution of a malicious script on a web site if it poses a threat. Strict control is also imposed over all HTTP-traffic. Web AV also analyzes web resources for phishing scams and filters banners and pop-ups.
Another line of defense can be a local security server that analyzes the situation within a local network, scans the local traffic and analyzes the network activity of computers. There are also web services that allow the verification of the Internet resource or files. The user downloads a file or enters a URL of the resource and the entire malware test takes place on the web-service of an AV company.
In the case of local AV remedies, a web resource is checked when it is downloaded, in other words, when the client application goes to the corresponding URL-address. It is important to note that in this case, the user is authenticated on the web resource and downloads content, as defined for his user account. The content can be in form of scripts, links, articles, messages, reports, letters from trusted users, etc.
FIG. 2 illustrates a conventional system for checking a web resource 200 for presence of malware or links to infected resources. A connection with the resource 200 is established by an application 220 of a client 210. The client application 210 in this case can be a regular browser, a file manager or another application that interacts with a server 200 via data transfer protocol 230.
The client application 220 transmits to the server 200 user identifiers 250. Depending on authentication scheme 260, the identifiers 250 can represent logins, passwords, session keys, cookie-files, special protocol headers, network or physical addresses of the computer, biometric data, certificates, etc. A request to the server 200 by the client-application 220 is processed by a server application 240.
An authorization 260 is performed, and depending on its results, the server 200 opens a document or generates a web page, an ftp-page or other data representation based on the data type provided by the server 200. Security of data transmitted between the client 210 and the server 200 is provided by a security module 270. This conventional system can be implemented on a personal computer as well as on a web server of an AV company. Typically, the relevance of the AV databases and the effectiveness of heuristic analysis on the server side can be higher than on a personal computer.
However, the principal difference in this case is the data presented by a web resource 200 being tested for malware presence based on the user identification data 250. When the AV check takes place on the server, the result of the authorization is either denial of access or a grant of the rights of the guest account. Thus, the outcome of an AV check, even at equal technological capabilities on the server and the user computer, will be significantly different.
Yet another shortcoming of conventional AV systems is restrictions imposed upon AV web-based scanners by routing rules. If a web resource is part of the network and has no external network address, it is not available outside the network, and it can only be verified by using other security tools installed on a computer system connected to this network.
Many malware creators and hackers are aware of online scanners and knowingly block access of the scanner applications to a web resource where they have planted malware components. This makes comprehensive malware scans difficult. Furthermore, in the conventional systems, a web page cannot be scanned until it is downloaded onto a user computer. A typical web resource can contain several thousand pages. AV checking all of the web pages using a conventional system (as illustrated in FIG. 2), requires a lot of time and resources.
The situation is further complicated when several different web resources need to be periodically scanned. Statistics indicate that the majority of malware components and links to infected pages are located on the main (i.e., home) pages of web resources or on the first pages of sub-sections. This also complicates AV checking process, since each server often uses different identification parameters. In order to check a list of web resources, an AV application needs to have access authorization that also complicates scanning web sites for malware.
Accordingly, there is a need in the art for a comprehensive malware scanning system that can effectively check the web resources with a minimal overhead and costs.