1. Field of the Invention
The present invention is related to assessing risk profiles of Internet resources. More specifically, the present invention is related to a system and method for developing a risk profile for an Internet resource by generating a reputation index, based on attributes of the resource collectively referred to as the reputation vector of the resource.
2. Description of the Related Art
Management of internet access, particularly to Web sites, has been accomplished in the past using “Content Filtering”, where Web sites are organized into categories and requests for Web content are matched against per-category policies and either allowed or blocked. This type of management focuses on the subject matter of a Web site, and provides visibility into, for example, how employees spend their time, and their company's network bandwidth usage, during the course of the day. These solutions also allow companies to enforce established internet usage policy (IUP) by blocking Web sites whose subject matter violates their IUP.
Security solutions, such as anti-virus products, examine file or Web page content to discover known patterns or signatures that represent security threats to users, computers, or corporate networks. These focus not on the subject matter of a site, but look for viruses and other ‘malware’ that are currently infecting the site. However, current solutions to management of Internet resources fail to measure the security risk associated with accessing an Internet resource in a more predictive way, before infections are isolated and signatures are identified and distributed.
A possible analogy to the reputation of an Internet resource is the credit score of an individual. A Web user would want to be informed of the reputation of a Web site before visiting it, just as a lender would want to know the reputation, the financial reputation at least, of a borrower of the lender's money.
A credit score is based on a variety of fairly tightly related factors, such as existing debt, available credit lines, on-time payments, existing credit balances, etc.
In the United States, a credit score is a number based on a statistical analysis of a person's credit files that represents the creditworthiness of that person, which is the likelihood that the person will pay their bills. A credit score is primarily based on credit information, typically from one of the three major credit agencies.
There are different methods of calculating credit scores. The best known one, FICO, is a credit score developed by the Fair Isaac Corporation. FICO is used by many mortgage lenders that use a risk-based system to determine the possibility that the borrower may default on financial obligations to the mortgage lender.
FICO® scores are provided to lenders by the three major credit reporting agencies: Equifax, Experian and TransUnion. When lenders order your credit report, they can also buy a FICO® score that is based on the information in the report. That FICO® score is calculated by a mathematical equation that evaluates many types of information from the borrower's credit report at that agency. In order for a FICO® score to be calculated on the borrower's credit report, the report must contain sufficient information—and sufficient recent information—on which to base a score. Generally, that means the borrower must have at least one account that has been open for six months or longer, and at least one account that has been reported to the credit reporting agency within the last six months.
FICO scores provide a reliable guide to future risk based solely on credit report data. FICO® scores have a 300-850® score range. The higher the score, the lower the risk. But no score says whether a specific individual will be a “good” or “bad” customer. And while many lenders use FICO® scores to help them make lending decisions, each lender has its own strategy to determine if a potential borrower is a good customer. Although FICO won't reveal exactly how it determines a credit score, it considers the following factors: payment history (35%); outstanding debt (30%); length of credit history (15%); types of credit (10%); and new credit (10%).
Returning to Internet resources, attackers have been using the Internet to attack the computers and other devices of users of the Internet. Attackers continue to take advantage of flaws in traditional security measures and bypass reputation-based systems to increase attack effectiveness.
In 2008, massive attacks were conducted that compromised hundreds of thousands of legitimate Web sites with good reputations worldwide with data-stealing malicious code. The attacks included sites from MSNBC, ZDNet, Wired, the United Nations, a large UK government site, and more. In the attacks, when a user's browser opened one of the thousands of compromised sites, a carefully crafted iframe HTML tag redirected users to a malicious site rife with exploits. As a result, malicious code, designed to steal confidential information, was launched on vulnerable machines. In addition to Web exploits, email spammers are also taking advantage of the reputation of popular email services like Yahoo! and Gmail to bypass anti-spam systems.
Also, spammers use sophisticated tools and bots to break the “CAPTCHA-” systems that were developed to keep email and other services safe from spammers and other malicious activity. MICROSOFT Live Mail, GOOGLE's popular Gmail service and Yahoo! mail services were all compromised by this breakthrough method. Subsequently, spammers have been able to sign up for the free email accounts on a mass basis and send out spam from email accounts with good reputations. With a free signup process, access to a wide portfolio of services and domains that are unlikely to be blacklisted given their reputation, spammers have been able to launch attacks on millions of users worldwide while maintaining anonymity.
Thus, prior art solutions have focused on security when accessing known infected sites in the Internet from a network such as a local area network or a wide area network.
Hegli et al., U.S. Pat. No. 7,483,982 for Filtering Techniques For Managing Access To Internet Sites Or Other Software Applications discloses a system and method for controlling an end user's access to the Internet by blocking certain categorized sites or limiting access based on bandwidth usage.
Hegli et al., U.S. Pat. No. 6,606,659 for a System And Method For Controlling Access To Internet Sites discloses a system and method for controlling an end user's access to the Internet by blocking certain categorized sites or limiting the number of times the end user can access an Internet site.
Yavatkar et al., U.S. Pat. No. 6,973,488 for Providing Policy Information To A Remote Device discloses a method for distributing high level policy information to remote network devices using a low-level configuration.
Turley et al., U.S. Patent Publication Number 2005/0204050 for a Method And System For Controlling Network Access discloses a system and method for controlling access to a specific site by using a gateway that assigns incoming traffic to specific sections of the site.
Shull et al., U.S. Pat. No. 7,493,403 for Domain Name Validation discloses accessing domain name registries to determine the ownership of a domain and monitoring the domain and registry.
Roy et al., U.S. Pat. No. 7,406,466 for a Reputation Based Search discloses using a search engine to present search results associated with measures of reputation to overcome the problem of META tags skewing the search results.
Hailpern et al., U.S. Pat. No. 7,383,299 for a System And Method For Providing Service For Searching Web Site Addresses discloses
Moore et al., U.S. Pat. No. 7,467,206, for a Reputation System For Web Services discloses a system and method for selecting a Web service from a search engine list which is ranked based on reputation information for each Web service.
Definitions for various terms are set forth below.
FTP or File Transfer Protocol is a protocol for moving files over the Internet from one computer to another.
HyperText Markup Language (HTML) is a method of mixing text and other content with layout and appearance commands in a text file, so that a browser can generate a displayed image from the file.
Hypertext Transfer Protocol (HTTP) is a set of conventions for controlling the transfer of information via the Internet from a Web server computer to a client computer, and also from a client computer to a Web server.
Internet is the worldwide, decentralized totality of server computers and data-transmission paths which can supply information to a connected and browser-equipped client computer, and can receive and forward information entered from the client computer.
JavaScript is an object-based programming language. JavaScript is an interpreted language, not a compiled language. JavaScript is generally designed for writing software routines that operate within a client computer on the Internet. Generally, the software routines are downloaded to the client computer at the beginning of the interactive session, if they are not already cached on the client computer. JavaScript is discussed in greater detail below.
Parser is a component of a compiler that analyzes a sequence of tokens to determine its grammatical structure with respect to a given formal grammar. Parsing transforms input text into a data structure, usually a tree, which is suitable for later processing and which captures the implied hierarchy of the input. XML Parsers ensure that an XML document follows the rules of XML markup syntax correctly.
URL or Uniform Resource Locator is a address on the World Wide Web.
Web-Browser is a complex software program, resident in a client computer, that is capable of loading and displaying text and images and exhibiting behaviors as encoded in HTML (HyperText Markup Language) from the Internet, and also from the client computer's memory. Major browsers include MICROSOFT INTERNET EXPLORER, NETSCAPE, APPLE SAFARI, MOZILLA FIREFOX, and OPERA.
Web-Server is a computer able to simultaneously manage many Internet information-exchange processes at the same time. Normally, server computers are more powerful than client computers, and are administratively and/or geographically centralized. An interactive-form information-collection process generally is controlled from a server computer, to which the sponsor of the process has access. Servers usually contain one or more processors (CPUs), memories, storage devices and network interface cards. Servers typically store the HTML documents and/or execute code that generates Web-pages that are sent to clients upon request. An interactive-form information-collection process generally is controlled from a server computer, to which the sponsor of the process has access.
World Wide Web Consortium (W3C) is an unofficial standards body which creates and oversees the development of web technologies and the application of those technologies.
XHTML (Extensible Hypertext Markup Language) is a language for describing the content of hypertext documents intended to be viewed or read in a browser.
XML (Extensible Markup Language) is a W3C standard for text document markup, and it is not a language but a set of rules for creating other markup languages.
The prior art fails to provide solutions to the problems with accessing the Internet.