Nowadays, a domain generation algorithm (DGA) is frequently used for malicious purposes by unscrupulous individuals. Malware such as Zeus or Conficker uses DGA-generated host names for command and control callback purposes in their robot networks. For example, a variant of the latter is able to generate up to 50,000 random domains every day, making it difficult for a traditional blacklist approach to thwart this malicious software.
In the prior art, several approaches have been developed for the detection of DGA-based robot networks. Although these approaches may differ by the features selected (or derived) to discriminate host names generated by a domain generation algorithm (“DGA-based host names”) from those of ordinary benign host names, they have a common focus. That focus is to use large quantity of DNS query data collected from an Internet service provider as the input source and the main objective is to find the entity behind the robot network, whether previously known or unknown. Such an approach usually involves using a classification scheme to differentiate normal host names from DGA-based host names, and in most cases the differentiation is done by using groups of host names instead of individual host names. The disadvantage of this approach is that it requires a large amount of DNS query data from a variety of different sources, which may not be available in practice.
Accordingly, new techniques are desired to assist in the identification of host names that may have been generated by a domain generation algorithm.