It is often desirable to analyze time-series data for anomalies. For example, time-series data may be analyzed to monitor stock exchange data or data recorded in logs reflecting traffic through firewalls or telephone systems. Such analysis may also be used in detection of “malware.” Malware, short for “malicious software,” is software that is designed for hostile or intrusive purposes. For example, malware may be designed with the intent of gathering confidential information, denying or disrupting operations, accessing resources without authorization, and other abusive purposes. Types of malware include, for example, computer viruses, worms, Trojan horses, spyware, adware, and botnets. Malware developers typically distribute their software via the Internet, often clandestinely. As Internet use continues to grow around the world, malware developers have more incentives for releasing such software.
Botnets are one example of malware that have become a major security threat in recent years. A botnet is a network of “innocent” host computers that have been infected with malicious software in such a way that a remote attacker is able to control the host computers. The malicious software used to infect the host computers is referred to as a “bot,” which is short for “robot.” Botnets operate under a command and control (C&C) architecture, where a remote attacker is able to control the infected computers, often referred to as “zombie” computers. An attacker may control the infected computers to carry out online anti-social or criminal activities, such as e-mail spam, click fraud, distributed denial-of-service attacks (DDoS), or identity theft.
FIG. 1 illustrates an exemplary C&C architecture of a botnet 100. The botnet master 101, often referred to as a “botmaster” or “bot herder,” distributes malicious bot software, typically over the Internet 102. This bot software stores an indication of a future time and of domain names to contact at the indicated future time. The bot software infects a number of host computers 103 causing them to become compromised. Users of host computers 103 typically do not know that the bot software is running on their computers. Botnet master 101 also registers temporary domain names to be used as C&C servers 104. Then, at the indicated future time, the bot software instruct host computers 103 to contact C&C servers 104 to get instructions. The instructions are sent over a C&C channel via the Internet 102. The ability to send instructions to host computers 103 provides botnet master 101 with control over a large number of host computers. This enables botnet master 101 to generate huge volumes of network traffic, which can be used for e-mailing spam messages, shutting down or slowing web sites through DDoS attacks, or other purposes.
Botnets exploit the domain name system (DNS) to rally infected host computers. The DNS allows people using the Internet to refer to domain names, rather than Internet Protocol (IP) addresses, when accessing websites and other online services. Domain names, which employ text characters, such as letters, numbers, and hyphens (e.g., “www.example.com”), will often be easier to remember than IP addresses, which are numerical and do not contain letters or hyphens (e.g., “128.1.0.0”). In addition, a domain name may be registered before an IP address has been acquired. The DNS is the Internet's hierarchical lookup service for mapping character-based domain names meaningful to humans into numerical IP addresses.
Botnets exploit the DNS by registering domain names to be temporarily used as C&C servers 104. However, a botnet master will often distribute bot software before registering the domains indicated in the bot software. By the time bot software instructs host computers 103 to contact C&C servers 104, the bot master 101 will often have only registered a subset of the domains indicated in the bot software. Thus, when bot software instructs host computers 103 to contact C&C servers 104, host computers 103 will often attempt to contact a number of unregistered domains.
Legitimate internet user activity will include a mixture of requests for existent domains (YXDs) and non-existent domains (NXDs). In addition, legitimate internet user activity will have a periodic nature such that activity is, on average, higher at some predictable times and lower at other predictable times (e.g., an internet user may be more active during the day than during the night, and may be more active during weekdays than during weekends). Because of the periodic nature of a typical internet user's activity, an examination of NXD data will often reveal a predictable pattern over one or more periods of time.
Illegitimate internet use, such as by host computers 103 in botnet 100, will also include a mixture of requests for YXDs and NXDs. However, because a botnet master 101 will typically only register a small subset of the domain names that it provides in the bot software, after host computers 103 attempt to access the C&C servers 104 a spike in the overall quantity of NXDs will arise that deviates from the predictable periodic nature of legitimate internet user activity.