Malicious botnets are one of the most potent threats to networking systems. To create malicious botnets, malware often utilizes a Domain Generation Algorithm (DGA) to generate domain names. In the phase of establishing the botnet, the malware employs the DGA generated domain name to establish a network communication with a Command & Control (C&C) server that is used by a botnet's originator (or “bot master”) to control the botnet entities (bots) remotely. The use of the DGA makes it difficult to uncover the C&C server since the DGA can generate many domains, with only a (frequently changing) subset being registered and employed. Once a malicious botnet is established, the malicious botnet may deploy a platform for performing malicious activities such as denial-of-service (DoS) attacks, information gathering, distributed computing, cyber fraud, malware distribution, unsolicited marketing, etc.
Since typically generated domain names are unusual, they are therefore easy to identify by a network administrator or sophisticated automated detection systems (e.g., using statistical features). In reaction to the advances in network intrusion detection systems, malicious actors have started using domain names that appear legitimate. In particular, they still rely on DGAs, but maintain a dictionary of words or other linguistic units (such as syllables and meaningful units including roots, stems, prefixes, suffixes, etc.) from natural language, generating the domains by concatenating various such units, sometimes also with acronyms, abbreviations, neologisms, numbers and other characters.