The growth of computer networking has brought with it domains used for unscrupulous activities. Such domains may be used for scams, phishing, spamming, botnet command-and-control activities, etc. The ability to perpetrate such malicious activity depends on coordinating a large collection of hosts to perform a particular activity. Operators of these large-scale operations typically use the Domain Name System (DNS) to help direct hosts to the appropriate location on the network because the major task of the DNS system is to maintain the mapping between the host names and the Internet protocol (IP) addresses of a Web site. In the case of attacks such as spam and phishing attacks, these domains may be used to direct victims to a Web site (or through a proxy) that is hosting malicious content. In the case of botnet command-and-control, bots may locate the “controller” machine according to its domain name.
For these reasons and others, the ability to identify domain names that correspond to unscrupulous activity or otherwise unwanted traffic may be extremely valuable. Characterizing the behavior of these domains may not only help identify domains used for malicious behavior, but may also help identify individual attacking hosts and victim hosts.
Top-level domain (TLD) servers are responsible for maintaining zone information (usually second-level domains) and for answering the queries directed to registered domains. For example, VeriSign, Inc. operates the generic top-level domains (gTLDs) for .com and .net. The top-level domain servers generally maintain two kinds of dynamics about the second-level domains (2LDs). The first type of information is the Domain Name Zone Alert (DNZA). This information includes data about changes in the zone, for example, whether a domain name was newly registered or a name server's Internet Protocol (IP) address was modified. The DNZA files track these changes.
The second type of information concerns the DNS queries submitted by the recursive servers. A recursive server receives a DNS request from a user's computer and, in turn, sends a request to a top-level domain server to process the user's request. Each recursive server represents a different geographical region, and the recursive servers may be grouped into sub-network blocks by region using the prefix of the IP address of the recursive servers. The prefix may be the /24 sub-network prefix, and the recursive servers with the same prefix may be referred to as a sub-network, or a /24 sub-network.
When the recursive servers send queries to a TLD server for resolving the 2LD domains names, the TLD server may store records of the queries. The TLD server, or a central repository, may aggregate the source IP addresses of the queries into /24 sub-networks. This may allow a TLD server or a central repository to monitor the number of queries for each domain. A query record may show the relationship between the domain names and the /24 sub-networks submitting queries. Examples of a DNZA record and a DNS query record are shown in Table 1.
TABLE 1Data format examplestypeexampleDNZA entryadd-new example.com NS ns1.example.comQuery recordexample.com 111.111.111.0.22.22.22.0 3
The DNZA entry shows that an “add-new” command created a new domain example.com and the name space (NS) record is ns1.example.com. The query record shows that there were 3 queries from the /24 subnets of “111.111.111.0” and “22.22.22.0” for the domain “example.com.” On an average day, a TLD server may handle queries for 80 million different domains.
Once a domain gets registered, several basic entries, called Resource Records, are created to refer to the services for the domain. The major Resource Records may include, NS-type records that point to the authoritative name servers for the zone, MX-type records that point to the domains' mail servers, and A-type records that point to the host representing the domain. The NS- and MX-type records can further resolve to IP addresses. The number of IP addresses associated with the NS- and MX-type records is typically much less than the number of the domains being registered because the same server may be repeatedly used by many different domains to hose DNS infrastructure.
Certain methods of characterizing DNS lookup behavior characterize DNS lookup behavior from different vantage points, such as below the local recursive resolver within an organization. These methods recognize that hosts within a single enterprise may exhibit coordinated lookup behavior to malicious domains, so clustering their activity patterns may yield information about the reputation of individual domains. But, because malicious activity often relies on coordinated activity across multiple servers, the view of DNS lookup behavior below a single recursive DNS resolver fails to capture behavior unique to malicious domains but visible only from a perspective that captures lookup behavior across servers, among other things.
For example, the Proactive Domain Blacklisting work by M. Felegyhazi, C. Kreibich, and V. Paxson in Third USENIX Workshop on Large-Scale Exploits and Emergent Threats (LEET '10), 2010, focuses on the domain registration and DNS zone information to predict the malicious use of domains. This study, however, fails to take advantage of the lookup patterns to the domains, including the addresses of the recursive DNS servers and number of queries from the servers.