During recent years, the number of malware attacks on computer networks has increased dramatically, as has the dependence of companies and government agencies on their computer networks. Consequently, many efforts are made in order to provide computer and data protection in multi-connectivity data networks.
A single malware (malicious software) in a computerized system, intruding the system from the computer network, can result in the loss and/or unauthorized utilization and/or modification of large amounts of data. As a case in point, the total cost of the Code Red worm epidemic, as measured in lost productivity owing to interruptions in computer and network services, is estimated at $2.6 billion (Lemos, R, Counting the Cost of Slammer, CNET news, January 2003).
Many malware protection methods rely on malware signatures generated from malware-containing files. Generally, such signatures are compared to the code of any file traveling on the internet, and if similar segments of code are found in any file, that file is forbidden continued progress across the internet.
Automated signature generation for new attacks of this type is extremely difficult due to several reasons (Yong Tang, Shigang Chen, Defending Against Internet Worms: A Signature-Based Approach, in Proc. of IEEE INFOCOM'05, Miami, Fla., USA, May 2005). First, in order to create an attack signature, the attack traffic must be isolated from the legitimate traffic, which is not an easy task in all situations. Second, the signature generation must be general enough to capture all attack traffic of certain type but still be specific enough to avoid overlapping with the contents of normal traffic. Nowadays, the above problems are handled in an ad-hoc manner, based on human judgment. As a case in point, current rule-based network Intrusion Detection Systems (IDS) can do little to stop zero-day worms (Ke Wang and Salvatore J. Stolfo, “anomalous payload-based network intrusion detection”, in Recent Advance in Intrusion Detection (RAID), September 2004). Such systems depend upon signatures only known after the worm has been launched successfully, essentially disclosing its contents and method of infection for later deployment.
Various techniques have been proposed to derive malware signatures automatically, including among others: Vulnerability-based Signatures (David Brumley, et al., “Towards Automatic Generation of Vulnerability-Based Signatures”, in the Proceedings of the 2006 IEEE Symposium on Security and Privacy); Payload-based Signatures (Ke Wang and Salvatore J. Stolfo, “anomalous payload-based network intrusion detection”, in Recent Advance in Intrusion Detection (RAID), September 2004; Kim, H.-A. and Karp, B., “Autograph: Toward Automated, Distributed Worm Signature Detection”, in Proceedings of the 13th Usenix Security Symposium (Security 2004), San Diego, Calif., August, 2004); Semantic-Aware Signatures (Vinod Yegneswaran et al., “An architecture for generating semantics-aware signatures In 14th USENIX Security Symposium”, Baltimore, Md., August 2005); The Amd Algorithm (Mihai Christodorescu, et al., “Semantics-aware malware detection” in IEEE Symposium on Security and Privacy, Oakland, Calif., May 2005) and Honeypot-based Signatures (Yong Tang, Shigang Chen, Defending Against Internet Worms: A Signature-Based Approach, in Proc. of IEEE INFOCOM'05, Miami, Fla., USA, May 2005).
One of the main drawbacks of the prior art techniques, including those listed above, is that they tend to overemphasize the protection against recent worm epidemics and do not provide a general-purpose technique which can handle other types of malware as well. Moreover, signatures generated by the prior art techniques examine suspicious code by modeling malicious patterns in communication packets, payloads, unsuccessful connections, sessions and code structure (templates). Such signatures were tested and reported to be effective for small-sized malware. Nevertheless, they ignore the fact that many types of malware appear as full-fledged executable and therefore contain a significant portion of common-code emanating from the development tools and software packages used by the malware authors. Such common code is fixed across malware instances due to the fact that hackers usually repeatedly exploit a “toolbox” including a set of functions which enable them to generate various malware instances. Since signature-based systems do not examine malware at the executable level and do not account for these common code segments, which can be quite large, the quality of the signature generation mechanisms is dubious. Such quality is measured using the following criteria: sensitivity (low false negative for malware) and specificity (low false positive for innocuous traffic).
In the art there have been developed several methods aiming at expediting the process of signature generation for the effective containment of worms. “Autograph” (Kim, H.-A. and Karp, B., “Autograph: Toward Automated, Distributed Worm Signature Detection”, in Proceedings of the 13th Usenix Security Symposium (Security 2004), San Diego, Calif., August, 2004), for example, stores source and destination addresses of each inbound unsuccessful TCP connection it observes. Once an external host has made unsuccessful connection attempts to more than a predetermined number of internal IP addresses, the flow classifier considers it to be a scanner. All successful connections from an IP address flagged as scanners are classified as suspicious, and their inbound packets written to the suspicious flow pool. In the next stage “Autograph” selects the most frequently occurring byte sequences across the flows in the suspicious flow pool and stores them as signatures. At the beginning of a worm's propagation, the aggregate rate at which all infected hosts scan the IP address space is relatively low. However, since “Autograph” relies on “overhearing” unsuccessful scans to identify suspicious source IP addresses, early in an epidemic an “Autograph” monitor is slow to accumulate suspicious addresses, and in turn slow to accumulate worm payloads. To address this problem “Autograph” uses a tattler that, as its name suggests, shares suspicious source addresses among all monitors, toward the goal of accelerating the accumulation of worm payloads.
Tang and Chen (Yong Tang, Shigang Chen, Defending Against Internet Worms: A Signature-Based Approach, in Proc. of IEEE INFOCOM'05, Miami, Fla., USA, May 2005) addressed the above problems by designing a double-honeypot system (N. Provos, “A virtual Honeypot Framework,” Center for Information Technology Integration, University of Michigan, Ann Arbor, Mich., USA, Tech. Rep. CITI Technical Report 03-1, October 2003). Their system is deployed in a local network for automatic detection of worm attacks from the Internet. It should be noted that this system is not suitable for a large ISP environment.
Two further algorithms, based on Expectation-Maximization (C. E. Lawrence and A. A. Reilly, “An Expectation Maximization (EM) Algorithm for the Identification and Characterization of Common Sites in Unaligned Biopolymer Sequences,” PROTEINS:Structure, Function and Genetics, vol. 7, pp. 41-51, 1990) and Gibbs sampling (C. E. Lawrence, et al., “Detecting Subtle Sequence Signals: A Gibbs Sampling Strategy for Multiple Alignment,” Science, vol. 262,pp. 208-214, October 1993), were proposed for efficient computation of Position Aware Distribution Signature (PADS). Experiments based on variants of the MS-Blaster worm were performed. The results showed that the signature-based defense system can accurately separate new variants of the worm from the normal background traffic by using the PADS signature derived from the past samples.
The main drawback common to all of the automatic signature generation techniques mentioned above is that they are appropriate for instances of short, stream-based, worms such as Nimda, Code Red/Code Red II, MS Blaster (1.8 KB long), Sober, Netsky and B[e]agle. Larger malware files carrying full-fledged applications usually contain many code segments used by the development platform which was implemented to generate the malware instance. Those segments are not unique to the malware. As a result, selecting a signature that will be both sensitive and specific is a very challenging task for the above-mentioned techniques, when the file comprising the malware is relatively large.
It would therefore be highly desirable to develop an automatic signature extraction method that would account for the common code segments. Such a method would be able to detect malware appearing in small, as well as large, files. The types of malware which could be detected by such a method include Trojan horses, spyware, adware, viruses, and worms operating in a Windows environment.
It is the object of the present invention to provide a method by which malware signatures are essentially automatically extracted.
It is a further object of the present invention to provide a method which is suitable to extract signatures from large files.
It is yet a further object of the present invention to provide a method with low false negative for malware, as well as low false positive for innocuous traffic.
Further purposes and advantages of this invention will become apparent as the description proceeds.