The Internet enables a user of a client computer system to identify and communicate with millions of other computer systems located around the world. A client computer system can identify each of these other computer systems using a unique numeric identifier for that computer called an “IP address.” When a communication is sent from a client computer system to a destination computer system, the client computer system typically specifies the IP address of the destination computer system in order to facilitate the routing of the communication to the destination computer system. For example, when a request for a World Wide Web page (“Web page”) is sent from a client computer system to a Web server computer system (“Web server”) from which that Web page can be obtained, the client computer system typically includes the IP address of the Web server.
In order to make the identification of destination computer systems more mnemonic, a Domain Name System (DNS) has been developed that translates a unique alphanumeric name for a destination computer system into the IP address for that computer. The alphanumeric name is called a “domain name.” For example, referring to FIG. 10, the domain name for a hypothetical computer system operated by IBM Corporation may be “comp23.IBM.com”. Using domain names, a user attempting to communicate with this computer system could specify a destination of “comp23.IBM.com” rather than the particular IP address of the computer system (e.g., 198.81.209.25).
A user can also request a particular resource (e.g., a Web page or a file) that is available from a server computer by specifying a unique Universal Resource Indicator (“URI”), such as a Uniform Resource Locator (“URL”), for that resource. A URL includes a protocol to be used in accessing the resource (e.g., “http:” for the HyperText Transfer Protocol (“HTTP”)), the domain name or IP address of the server that provides the resource (e.g., “comp23.IBM.com”), and optionally a path to the resource (e.g., “/help/HelpPage.html”). Thus “http://comp23.IBM.com/help/HelpPage.html” is one example of a URL. In response to a user specifying such a URL, the comp23.IBM.com server would typically return a copy of the “HelpPage.html” file to the user.
In addition to making the identification of destination computer systems more mnemonic, domain names introduce a useful layer of indirection between the name used to identify a destination computer system and the IP address of that computer system. Using this layer of indirection, the operator of a particular computer system can initially associate a particular domain name with a first computer system by specifying that the domain name corresponds to the IP address of the first computer system. At a later time (e.g., if the first computer system breaks or must be replaced), its operator can “transfer” the domain name to a second computer system by then specifying that the domain name corresponds to the IP address of the second computer system.
The domain names in DNS are structured in a hierarchical, distributed database that facilitates grouping related domain names and computers and ensuring the uniqueness of different domain names. In particular, as mentioned above, a particular domain name such as “IBM.com” may identify a specific host computer. However, the hierarchical nature of DNS also allows a domain name such as “IBM.com” to represent a domain including multiple other domain names each identifying computers (also referred to as “hosts”), either in addition to or instead of identifying a specific computer.
FIG. 10 illustrates a hypothetical portion of the DNS database 1000 in which the node representing the IBM.com domain name 1010 is the root node in an IBM.com domain 1050 that includes 7 other nodes each representing other domain names. Each of these domain names in the IBM.com domain can be, but do not have to be, under the control of a single entity (e.g., IBM Corporation). FIG. 10 also includes a WebHostingCompany.com domain 1055 that includes a single domain name.
As illustrated, the DNS database can be represented with a hierarchical tree structure, and the full domain name for a given node in the tree can be determined by concatenating the name of each node along the path from the given node to the root node 1001, with the names separated by periods. Thus, the 8 nodes in the IBM.com domain represent the domain names IBM.com 1010, foo.IBM.com 1012, foo.foo.IBM.com 1018, bar.foo.IBM.com 1020, bar.IBM.com 1014, comp23.IBM.com 1016, abc.comp23.IBM.com 1022, and cde.comp23.IBM.com 1024. Other “.com” domain names outside the IBM.com domain are also illustrated in FIG. 9, including the second-level domain names BCD-Corp.com 1032, WebHostingCompany.com 1034, 1-800-555-1212.com 1042 and 123456.com 1044, and the lower-level domain names 123.123456.com 1046 and 456.123456.com 1048. In addition to the “.com” top-level domain (“TLD”), other TLDs are also illustrated including the “.cc” geographical TLD and the “.gov”, “.edu” and “.mil” organizational TLDs. Illustrated domain names under these other TLDs include Stanford.edu 1036, Berkeley.edu 1038, and RegistrarCompany.cc 1040.
New domain names can be defined (or “registered”) by various domain name registrars. In particular, a company that serves as a registrar for a TLD can assist customers in registering new domain names for that TLD and can perform the necessary actions so that the technical DNS information for those domain names is stored in a manner accessible to name servers for that TLD. Registrars often maintain a second-level domain name within the TLD (e.g., a hypothetical Registrar Company that acts as a registrar for the “.cc” TLD could maintain the RegistrarCompany.cc domain name 1140), and provide an interactive Website at their domain name from which customers can register new domain names. A registrar will typically charge a customer a fee for registering a new domain name.
For the “.com”, “.net” and “.org” TLDs, a large number of registrars currently exist, and a single shared registry (“the Registry”) under the control of a third-party administrator stores information identifying the authoritative name servers for the second-level domain names in those TLDs. Other TLDs may have only a single registrar, and if so that registrar could maintain a registry for all the second-level domains in that TLD by merely storing the appropriate DNS information for each domain name that the registrar registers. In other situations, multiple registrars may exist for a TLD, but one of the registrars may serve as a primary registrar that maintains a registry for each of the second-level domains in that TLD—if so, the secondary or affiliate registrars for that TLD supplies the appropriate DNS information for the domain names that they register to the primary registrar. Thus, the manner in which the DNS information for a TLD is obtained and stored is affected by the registrars for that TLD.
Currently, there are a limited number of TLDs, and many available domain names in the most popular TLDs (e.g., “.com”) have already been taken. Thus, users will often have difficulty identifying available or non-registered domain names, i.e., domains that therefore do not exist in the authoritative registry for the TLD, that are “non-existent domains” also termed NXDomains or NXDs. A user may, instead, often attempt to register domain names that are already registered. In such a situation, the user will be prevented from registering the domain name, but may receive little or no assistance in determining other domain names that are available. However, utilizing concepts and techniques described in applicant's copending U.S. patent application Ser. No. 12/763,349 filed Apr. 20, 2010, incorporated herein by reference in its entirety, capabilities are provided to track and organize NXDomains to support searching of the domain names to minimize or eliminate the burden of searching for an available domain name.
Because domain name resolution provided by DNS is essential to operation of the Internet and email, continual availability, operation and functioning of the system is critical. Unfortunately, not all network traffics are legitimate and, as a matter of fact, a lot of malicious traffic is passed through the Internet all the time. Such malicious DNS traffic can lead to various crimes and possibly exhaust a considerable amount of network bandwidth and resources. Therefore consideration must be given to possible scenarios that might impair DNS. Threats to the operation of the network may come in several forms including Internet bots as disclosed in U.S. Patent Publication No. US 2008/0025328 of Alberts (“Alberts”), the disclosure of which is incorporated herein by reference in its entirety. Alberts discloses enabling an end-user using an IP based network to on-line select and communicate with another end-user without revealing their identity. The selection of an end-user is performed by an Internet bot that is capable of accessing a profile list such that, during a phase in which information is transferred between both end-user, the identity of at least one end-user is not known to the other end-user because information is first transferred to the Internet bot and then from the Internet bot to the other end-user. Another scenario is described in U.S. Patent Publication No. US 2008/0155694 of Kwon et al. (“Kwon”), the disclosure of which is also incorporated herein by reference in its entirety. Kwon discloses a method for dealing with attacks of malicious BOTs, software for performing or controlling a predetermined operation by a specific event or a specific command as a script code having various functions including a remote function for specific objects. When a malicious BOT attacks a specific network or system, it generates more data than the capacity of the target network or system so as to disable the normal service. Kwon discloses addressing malicious BOTs by detecting and analyzing a domain name receiving excessive DNS queries to judge the infection of a malicious BOT, registering the corresponding domain name as normal or abnormal management target, and redirecting an abnormal DNS query for the abnormal management target to a redirection processing and response system.
Notwithstanding the benefits of DNS, the system is commonly utilized as a mechanism for other purposes such as Internet bots. These Internet bots consist of software applications that perform repetitive and automated tasks in the Internet that would otherwise be unfeasible for a human to do. Internet bots who utilize the DNS system inherently pollute legitimate human generated DNS traffic with machine generated requests. These requests, if unfiltered, will affect and skew systems designed to search NXDomains. By identifying and removing machine generated textual identifiers, systems utilizing the domain search mechanism can perform more effectively.
Additionally, the detection of machine generated textual identifiers can aid in the discovery and mitigation of malicious programs such as computer viruses. Viruses such as Conficker, also known as Downup, Downandup and Kido, have exploited the DNS system to be used as a mechanism to control infected computers. The virus would generate a list of random domain names and then attempt to connect to each of the domain names and delivery its payload message. The identification of these randomly generated domain names will help identify sources that are infected with some form of a machine generating textual identifier requesting agent. See, e.g., U.S. Patent Application Publication No. US 2009/0083411 of Takano et al.