With the abundance of new or modified malware being introduced daily on the Internet, network defense personnel are faced with the increasingly difficult challenge of identifying and analyzing a continuous stream of collected software samples in order to produce accurate and reliable signatures with which to defend against future attacks.
The following references discuss malware detection services and background related thereto: Bayesian Networks: Koller, Daphne and Nir Friedman. Probabilistic Graphical Models: Principles and Techniques. Cambridge, Mass.: MIT Press, 2009, page 45; Probabilistic Graphical Models (PGM) Koller, Daphne and Nir Friedman. Probabilistic Graphical Models: Principles and Techniques. Cambridge, Mass.: MIT Press, 2009, p. 3; Support Vector Machines (SVM) and Gaussian Mixture Models (GMM): Machine Learning: An Algorithmic Perspective (Chapman & Hall/Crc Machine Learning & Pattern Recognition) Chapman and Hall/CRC; 1 edition (Apr. 1, 2009); TCP/IP Protocol: Kevin R. Fall and W. Richard Stevens, TCP/IP Illustrated, 2nd ed., Pearson Education, Ann Arbor, Mich.: 2012; Malware Network Behavior:, “Port Scanning”, P. Mateti, http://www.cs.wrightedu/pmateti/InternetSecurity/Lectures/Probing/index.html; Flow Based Malware Detection: M. Skrzewski, “Flow Based Algorithm for Malware Detection”, in Computer Networks, 18th Conference, CN 2011 Proceedings, Springer, Berlin: 2011; General Malware Detection: E. Raftopolous, M. Dimitropoulos, “Detecting, Validating, and Characterizing Computer Infections in the Wild,” IMC '11, ACM: Berlin, 2011; and Malware Network Behavior: M. Krzywinski, Port Knocking: Network Authentication Across Closed Ports”, sysAdmin Magazine, 12:12-17 (2003). Each of these references are hereby incorporated herein by reference in their entireties.
However, existing malware detection services suffer from several deficiencies. Existing malware detection approaches use either signature-based malware detection or require a priori knowledge of specific malware characteristics or behaviors gleaned from manual identification. Yet, signatures are usually not available when new or modified malware is first introduced. Without the signatures, signature-based identification cannot be used to detect new or modified instances of malware. Similarly, a priori knowledge of specific malware characteristics or behaviors gleaned from manual identification requires advanced knowledge of the malware and/or manual classification. Yet, this knowledge may not be known beforehand for new or modified malware.
What is needed is a design such that as malware threats change and evolve, never-before-seen malware can be accurately identified using machine learning techniques.