Field of the Invention
This application relates to data and application security. In particular, this application discloses systems methods of collecting and mining data to predict the content or nature of a web site based on its web address.
Description of the Related Technology
Traditionally, computer viruses and other malicious content were most often provided to client computers by insertion of an infected diskette or some other physical media into the computer. As the use of e-mail and the Internet increased, e-mail attachments became a prevalent method for distributing virus code to computers. To infect the computer with these types of viruses having malicious content, some affirmative action was typically required by the user such as opening an infected file attachment or downloading an infected file from a web site and launching it on their computer. Over time, antivirus software makers developed increasingly effective programs designed to scan files and disinfect them before they had the opportunity to infect client computers. Thus, computer hackers were forced to create more clever and innovative ways to infect computers with their malicious code.
In today's increasingly-networked digital world, distributed applications are being developed to provide more and more functionality to users in an open, collaborative networking environment. While these applications are more powerful and sophisticated, their increased functionality requires that network servers interact with client computers in a more integrated manner. For example, where previous web applications primarily served HTML content to client browsers and received data back from the client via HTTP post commands, many new web applications are configured to send various forms of targeted content, such as active content, to the client computer which cause applications to be launched within the enhanced features of newer web browsers. For example, many web-based applications now utilize Active-X controls which must be downloaded to the client computer so they may be effectively utilized. Java applets, JavaScript, and VBScript commands also have the capability of modifying client computer files in certain instances.
The convenience that has arrived with these increases in functionality has not come without cost. Newer web applications and content are significantly more powerful than previous application environments. As a result, they also provide opportunities for malicious code to be downloaded to client computers. In addition, as the complexity of the operating system and web browsing applications increase, it becomes more difficult to identify security vulnerabilities which may allow hackers to transfer malicious code to client computers. Although browser and operating system vendors generally issue software updates to remedy these vulnerabilities, many users have not configured their computers to download these updates. Thus, hackers have begun to write malicious code and applications which utilize these vulnerabilities to download themselves to users' machines without relying on any particular activity of the user such as launching an infected file. One example of such an attack is the use of malicious code embedded into an active content object on a website. If the malicious code has been configured to exploit a vulnerability in the web browser, a user may be infected or harmed by the malicious code as a result of a mere visit to that page, as the targeted content in the page will be executed on the user's computer.
An attempt to address the problem of malicious code being embedded in active content is to utilize heightened security settings on the web browser. However, in many corporate environments, intranet or extranet applications are configured to send executable content to client computers. Setting browser settings to a high security level tends to impede or obstruct the effective use of these types of “safe” applications. Another attempt to address the issue is to block all executable content using a network firewall application. This brute force approach also is ineffective in many environments, because selective access to certain types of content is necessary for software to correctly function.