Understanding user activity on the Internet is becoming more important, and more difficult, as the Internet continues to expand. Commercial use of the internet is one area that has expanded dramatically in the last decade, and one that has a particular interest in understanding, monitoring and predicting user activity. One significant aspect regarding the commercial use of the internet is advertising. Advertisers may use factors such as traffic rankings in determining an appropriate web site or domain to advertise particular content. Likewise, web site owners may use traffic ranking to establish an appropriate fee for advertising on their web site. As known by those of skill in the art, internet advertising has also taken many different forms that may directly demonstrate the effectiveness of a particular advertisement, such as pay per click (PPC) applications. However, traffic rankings are still an important aspect of internet advertising, as well as other aspects of internet infrastructure management. For example, as the number of active web sites on the internet grows, there is an increased demand for accurate traffic ratings to inform decisions such as server management, web development, advertising focus and rates. However, there are limitations on the capabilities of conventional traffic monitoring services that typically monitor the traffic of users or web sites to calculate traffic scores.
A description of the ways in which the Internet is intrinsically organized can be helpful in understanding the challenges related to efficiently monitoring and rating the traffic for particular web sites on the internet.
The process of establishing a web site on the internet typically begins with a registrant registering a specific domain name through a registrar. The registrant is typically an individual or organization that identifies a domain name, such as “example.com”. The registrant contacts a registrar to process the name registration. The registrar sends the necessary domain name service (DNS) information to a registry. A registrar may maintain a database containing additional customer information beyond that which is sent to the registry.
The registry receives DNS information from registrars, inserts that information into a centralized database and propagates the information on the internet so that domain names can be found by users around the world.
In general, the DNS is the part of the Internet infrastructure that translates human-readable domain names into the Internet Protocol (IP) numbers needed to establish TCP/IP communication over the Internet. That is, DNS allows users to refer to web sites, and other resources, using easier to remember domain names, such as “www.example.com”, rather than the numeric IP addresses, such as “123.4.56.78”, assigned to computers on the Internet. Each domain name is made up of a series of character strings (labels) separated by dots. The right-most label in a domain name is known as the “top-level domain” (TLD). Examples of well-known TLDs are “.com”; “.net”; “.org.” etc. Each TLD supports second-level domains, listed immediately to the left of the TLD, e.g. the “example” level in “www.example.com”. Each second-level domain can include a number of third-level domains located immediately to the left of the second-level domain, e.g. the “www” level in “www.example.com”. There can be additional level domains as well, with virtually no limitation. For example, a domain with additional domain levels could be “www.photos.example.com”.
Additional non-domain information may be included in a Uniform Resource Identifier (“URI”) structure that includes the domain name. For example, a “path” part is a sequence of segments (conceptually similar to directories, though not necessarily representing them) separated by a forward slash (“/”). This information may be included immediately to the right of the domain name, such as the “blog” in “www.example.com/blog”, and may be used by a server or other receiving device to identify and deliver specific content or run particular code. Other examples of non-domain information may include queries and fragments, the specifics of which are understood by those of ordinary skill in the art and are not discussed in detail herein. Combinations of this information may be included in web page hyperlinks that navigate a user to another section of the same page or to another web page that may be part of the same, or a different, domain.
Related domain names, and content, may be organized in a hierarchical, or nested, manner, such as “www.example.com”; “www.blog.example.com”; “www.example.com/blog”; or “blog.example.com” etc, each with a different significance. Such related domains need not share similarities in the actual IP address to which the various domain names resolve to. In this regard, part of the domain name may signify a particular server which is desired, for example, “mail.example.com” and www.example.com” may resolve to different servers, with different functions, for the same second-level domain.
The above registration and structural aspects of the internet are then used by end-user applications to find specific resources on the internet by using the DNS resolution process. Aspects of the DNS resolution process are discussed below to aid in an understanding of the subject matter of the present application.
The responsibility for operating each TLD (including maintaining a registry of the second-level domains within the TLD) is delegated to a particular domain name registry. The registry is responsible for converting domain names to IP addresses (“resolving”) through DNS servers that maintain such information in large databases, and operating its top-level domain. The DNS stores IP addresses and domain names, facilitating service to addresses in TLDs, such as .com, .net, .edu, and .tv. Resolving is the process by which domain names are matched with corresponding IP numbers. Resolving is accomplished by a combination of computers and software, referred to as name servers that use the data in the DNS to determine which IP numbers correspond to a particular domain name. The following general definitions will be used herein.
Resolve: To translate domain name to IP address.
Resolver: A computer issuing a query in order to resolve a domain name.
Name server: A computer receiving queries and answering them directly or via resolve against other name servers.
Subnet: A group of IP addresses sharing octets of the IP address.
Internet domains can be divided to groups according to their TLD suffix (e.g. .com, .net, .co.uk . . . ) with different registries responsible for each of them. A single registry may be responsible for several of these groups, such as the VeriSign registry which is responsible for .com and .net domains.
The DNS is maintained by a distributed database system, which uses the client-server model. The nodes of this database are the name servers. Each domain or subdomain has one or more authoritative DNS servers that publish information about that domain and the name servers of any domains subordinate to it. The top of the hierarchy is served by the root name servers, the servers to query when looking up (resolving) a TLD.
The DNS distributes the responsibility of assigning domain names and mapping those names to IP addresses by designating authoritative name servers for each domain. Authoritative name servers are assigned to be responsible for their particular domain.
In theory a fully qualified domain name may have several name segments, (e.g. “www.one.type.example.com.” For querying purposes, the name segment is typically interpreted by segment, from right to left. At each step along the way, a corresponding DNS server is queried to provide a pointer to the next server which it should consult.
Because of the huge volume of requests generated by DNS, the resolution process also allows for caching (i.e. the local recording and subsequent consultation of the results of a DNS query) for a given period of time after a successful answer. How long a resolver caches a DNS response (i.e. how long a DNS response is considered valid) is determined by a value called the time to live (TTL). The TTL is generally set by the administrator of the DNS server handling the response. The period of validity may vary from just seconds to days or even weeks.
Based on the DNS structure, as well as the caching function, there are two classifications typically applied to the name servers, authoritative and recursive (caching). An authoritative name server is a name server that gives original, definitive answers (“authoritative” answers) to DNS queries. Every domain name must be assigned a set of authoritative name servers that are responsible for resolving the domain name.
As indicated above, the DNS also uses recursive cache servers, which store DNS query results for a period of time determined TTL of the domain name record in question. Typically, such caching DNS servers also implement the recursive algorithm necessary to resolve a given name starting with the DNS root through to the authoritative name servers of the queried domain. Internet service providers (ISPs) typically provide recursive caching name servers for their customers. In addition, many home networking routers implement DNS caches and recursors to improve efficiency in the local network.
DNS “stub” resolvers are also known that essentially operate as a cache-less application to resolve DNS names into IP addresses. The DNS stub resolver forwards DNS queries to the DNS server configured for the workstation (or server) and returns the DNS server's response to the requesting software. If a stub resolver queries a caching nameserver for a record that is being held by the caching server before the TTL has expired, the caching server will reply with the cached resource record rather than retrieve it from the authoritative name server again.