This specification relates to computer system security.
Many kinds of malicious computer software (“malware”) are used to compromise personal and business computers, computer networks, and computer-based devices, e.g., smartphones.
Advanced forms of malware infection are hard to identify with existing state-of-the-art signature based solutions. Supervised learning based techniques face challenges in getting comprehensive sets of ground truth infected samples, e.g., website reputation listings or detected files containing malware, for training predictive models. It is becoming increasingly important to be able to automatically identify zero-day malware attacks, including attacks based on IP fast flux, algorithmically generated domain names, and polymorphic malware.