A description of the ways in which the Internet is intrinsically organized can be helpful in understanding the challenges related to efficiently monitoring and rating the traffic for particular web sites on the internet.
The process of establishing a web site on the internet typically begins with a registrant registering a specific domain name through a registrar. The registrant is typically an individual or organization that identities a domain name, such as “example.com”. The registrant contacts a registrar to process the name registration. The registrar then sends the necessary DNS information to a registry. A registrar may maintain a database containing additional customer information beyond that which is sent to the registry.
The registry receives DNS information from registrars, inserts that information into a database and propagates the information on the internet so that domain names can be found by users around the world.
In general, the DNS is the part of the Internet infrastructure that translates human-readable domain names into the Internet Protocol (IP) numbers needed to establish TCP/IP communication over the Internet. That is, DNS allows users to refer to web sites, and other resources, using easier to remember domain names, such as “www.example.com”, rather than the numeric IP addresses, such as “123.4.56.78”, assigned to computers on the Internet. Each domain name is made up of a series of character strings (labels) separated by dots. The rightmost label in a domain name is known as the “top-level domain” (TLD). Examples of well-known TLDs are “.com”; “.net”; “.org.” etc. Each TLD supports second-level domains, listed immediately to the left of the TLD, e.g. the “example” level in “www.example.com”. Each second-level domain can include a number of third-level domains located immediately to the left of the second-level domain, e.g. the “www” level in “www.example.com”. There can be additional level domains as well, with virtually no limitation. For example, a domain with additional domain levels could be “www.photos.example.com”.
Additional non-domain information may be included in a Uniform Resource Identifier (“URI”) structure that includes the domain name. For example, a “path” part is a sequence of segments (conceptually similar to directories, though not necessarily representing them separated by a forward slash (“/”). This information may be included immediately to the right of the domain name, such as the “blog” in “www.example.com/blog”, and may be used by a server or other receiving device to identify and deliver specific content or run particular code. Other examples of non-domain information may include queries and fragments, the specifics of which are understood by those of ordinary skill in the art and are not discussed in detail herein. Combinations of this information may be included in web page hyperlinks that navigate a user to another section of the same page or to another web page that may be part of the same, or a different, domain.
Related domain names, and content, may be organized in a hierarchical, or nested, manner, such as “www.example.com”; “www.blog.example.com”; “www.example.com/blog”; or “blog.example.com” etc., each with a different significance. Such related domains need not share similarities in the actual IP address to which the various domain names resolve to, in this regard, part of the domain name may signify a particular server which is desired, for example, “mail.example.com” and www.example.com” may resolve to different servers, with different functions, for the same second-level domain.
The above registration and structural aspects of the internet are then used by end-user applications to find specific resources on the internet by using the DNS resolution process. Aspects of the DNS resolution process are discussed below to aid in an understanding of the subject matter of the present application.
The responsibility for operating each TLD (including maintaining a registry of the second-level domains within the TLD) is delegated to a particular domain name registry. The registry is responsible for converting domain names to IP addresses (“resolving”) through DNS servers that maintain such information in large databases, and operating its top-level domain. The DNS stores IP addresses and domain names, facilitating service to addresses in TLDs, such as .com, .net, .edu, and .tv. Resolving is the process by which domain names are matched with corresponding IP numbers. Resolving is accomplished by a combination of computers and software, referred to as name servers, which use the data in the DNS to determine which IP numbers correspond to a particular domain name. The following general definitions will be used herein.
Resolve: To translate domain name to IP address.
Resolver: A computer that can respond to a query in order to resolve a domain name.
Name server: A computer receiving queries and answering them directly or via a resolver against other name servers.
Subnet: A group of IP addresses sharing an initial sequence of octets of the IP address.
Internet domains can be divided to groups according to their TLD suffix (e.g., .com, .net, .co.uk . . . ) with different registries responsible for each of them. A single registry may be responsible for several of these groups, such as the VeriSign registry which is responsible for .com and .net domains.
The DNS is implemented as a distributed database system, which uses the client-server model. The nodes of this database are the name servers. Each domain or subdomain has one or more authoritative DNS servers that publish information about that domain and the name servers of any domains subordinate to it. The top of the hierarchy is served by the root name servers, the servers to query when looking up (resolving) a TLD.
The DNS distributes the responsibility of assigning domain names and mapping those names to IP addresses by designating authoritative name servers for each domain. Authoritative name servers are assigned to be responsible for their particular domain.
In theory a fully qualified domain name may have several name segments, (e.g. www.one.type.example.com). For querying purposes, the name segment is typically interpreted by segment, from right to left. At each step along the way, a corresponding DNS server is queried to provide a pointer to the next server which it should consult.
Because of the huge volume of requests generated by DNS, the resolution process also allows for caching (i.e. the local recording and subsequent consultation of the results of a DNS query) for a given period of time after a successful answer. How long a resolver caches a DNS response (i.e. how long a DNS response is considered valid) is determined by a value called the time to live (TTL). The TTL is generally set by the administrator of the DNS server handling the response. The period of validity may vary from just seconds to days or even weeks.
Based on the DNS structure, as well as the caching function, there are two classifications typically applied to the name servers, authoritative and recursive (caching). An authoritative name server is a name server that gives original, definitive answers (“authoritative” answers) to DNS queries. Every domain name must be assigned a set of authoritative name servers that are responsible for resolving the domain name.
As indicated above, the DNS also uses recursive cache servers, which store DNS query results for a period of time determined TTL of the domain name record in question. Typically, such caching DNS servers also implement a recursive algorithm to resolve a given name starting with the DNS root through to the authoritative name servers of the queried domain. Internet service providers (ISPs) typically provide recursive and caching name servers for their customers. In addition, many home networking routers implement DNS caches and recursors to improve efficiency in the local network.
DNS “stub” resolvers are also known that essentially operate as a cache-less application to resolve DNS names into IP addresses. The DNS stub resolver forwards DNS queries to the DNS server configured for the workstation (or server) and returns the DNS server's response to the requesting software. If a stub resolver queries a caching nameserver for a record that is being held by the caching server before the TTL has expired, the caching server will reply with the cached resource record rather than retrieve it from the authoritative name server again.
DNS resolvers can cache results for a period of time up to the TTL of the DNS record that was retrieved. In order to handle invalid records (due to errors or malicious activity), most resolvers offer a mechanism for cache flushing to evict invalid records from their cache and therefore retrieve updated records. Operationally, it is often difficult for organizations systematically flush the cache of all of their internal resolvers. In addition, many organizations or individuals may not be aware of an issue with an invalid record and will not know to execute a cache flush. As a result, the invalid record can last for some time, possibly exposing them to malicious servers.
TLD attacks are relatively easy to perpetrate due to the nature of DNS communications. That is, DNS communications are typically sent via the User Datagram Protocol (UDP). UDP is a simple communication protocol for transmitting small data packets without a connection handshake, acknowledgment, ordering, or error correction. The low processing overhead of UDP makes it useful for streaming media applications such as video and Voice over IP, and for answering small queries from many sources, such as in DNS resolution. Unfortunately, these same properties allow attackers to use DNS resolution for nefarious purposes. Because UDP is connectionless, an attacker can “spoof” the source address (that is, forge a false source IP address in the IP packet such that the DNS server sends the query response to a third party) without having to worry about completing a connection handshake, resulting in the DNS server sending responses to a machine that never sent a query. Moreover, the query message can be relatively small (under 512 bytes) while the resulting response can be substantially larger due to large numbers of resource records in the response. This allows an attacker to hijack a DNS server to magnify an attack. DNS queries and response may also be sent over stateful Transmission Control Protocol (TCP), which exhibits similar vulnerabilities that can also be managed using embodiments of the invention disclosed herein.