Malicious software, also known as computer contaminants or malware, is software that is intended to do direct or indirect harm in relation to one or more computer systems. Such harm can manifest as the disruption or prevention of the operation of all or part of a computer system, accessing private, sensitive, secure and/or secret data, software and/or resources of computing facilities, or the performance of illicit, illegal or fraudulent acts. Malware includes, inter alia, computer viruses, worms, botnets, trojans, spyware, adware, rootkits, keyloggers, dialers, malicious browser extensions or plugins and rogue security software.
Malware proliferation can occur in a number of ways. Malware can be communicated as part of an email such as an attachment or embedding. Alternatively malware can be disguised as, or embedded, appended or otherwise communicated with or within, genuine software. Some malware is able to propagate via storage devices such as removable, mobile or portable storage including memory cards, disk drives, memory sticks and the like, or via shared or network attached storage. Malware can also be communicated over computer network connections such as the internet via websites or other network facilities or resources. Malware can propagate by exploiting vulnerabilities in computer systems such as vulnerabilities in software or hardware components including software applications, browsers, operating systems, device drivers or networking, interface or storage hardware.
A vulnerability is a weakness in a computer system, such as a computer, operating system, network of connected computers or one or more software components such as applications. Such weaknesses can manifest as defects, errors or bugs in software code that present an exploitable security weakness. An example of such a weakness is a buffer-overrun vulnerability, in which, in one form, an interface designed to store data in an area of memory allows a caller to supply more data than will fit in the area of memory. The extra data can overwrite executable code stored in the memory and thus such a weakness can permit the storage of malicious executable code within an executable area of memory. An example of such malicious executable code is known as ‘shellcode’ which can be used to exploit a vulnerability by, for example, the execution, installation and/or reconfiguration of resources in a computer system. Such weaknesses, once exploited, can bootstrap a process of greater exploitation of a target system.
The effects of malware on the operation and/or security of a computer system lead to a need to identify malware in a computer system in order to implement protective and/or remedial measures. Malware propagated by, or communicating over, a network connection, such as the internet, by exploitation of a vulnerability in a target system can be particularly challenging to detect. Many systems monitor files stored or received in a file system with reference to a dictionary of malware “signatures”. A signature can be a pattern of data associated with known malware. Such an approach requires the receipt of known malware and is susceptible to subtle changes in malware which may render the malware undetectable in view of the stored signatures. Other systems monitor behavior of software to identify suspicious behavior in order to detect potential malware. Such systems therefore detect malware infection after-the-event and are susceptible to changes in malware and malware devised specifically to minimize suspicious behavior such as malware designed to behave like genuine software.
An alternative approach to the detection of malware is to detect network traffic associated with malware propagated by, or communicating over, a network connection. Such network traffic can be considered malicious network traffic occurring as part of network communications received by, or occurring between, computer systems, such as traffic attributable to malware software installed, being installed or being communicated for installation on a computer system. Traditional malicious traffic detection mechanisms depend on techniques including network traffic interception and analysis or network connection summarization which can determine key characteristics of a network connection such as source and destination addresses, source and destination ports and a protocol (known as a traffic characterizing 5-tuple). Such facilities are provided by technologies such as NetFlow (Cisco) or Yet Another Flowmeter (YAF). With these approaches, detection of malicious communication depends on an analysis of network traffic (or a summarization of traffic) to identify known characteristics of malicious traffic, such as known server addresses, protocols and/or port combinations. Such approaches are of limited effectiveness since it is not always possible to distinguish malicious traffic from non-malicious traffic without also referring to the contents of packets of network traffic by deep packet inspection (DPI) using tools such as BotHunter. BotHunter uses DPI to search for specific patterns in network traffic to detect executable downloads or signature strings associated with known malware.
However, DPI is ineffective where malicious network traffic is encrypted. The paper “Detecting Encrypted Botnet Traffic” (Zhang et al, Computer Communications Workshops (INFOCOM WKSHPS), 2013) acknowledges how detection systems such as BotHunter suffer significantly in the presence of encrypted traffic with detection rates reduced by almost 50%. Zhang et al describes an approach using BotHunter to detect encrypted malicious traffic. The approach of Zhang et al operates on the premise that the presence of at least one high entropy flow along with other features that BotHunter detects is a reliable detector of encrypted malicious traffic. In information theory, entropy is a measure of a degree of indeterminacy of a random variable (“Entropy”, R. L. Dobrushin V. V. Prelov, Encyclopedia of Mathematics, Springer, 2002, ISBN 1402006098). The theoretical basis for entropy calculation and the entropy of an information source is defined in detail in “A Mathematical Theory of Communication” (C. E. Shannon, The Bell System Technical Journal, Vol. 27, pp. 379-423, 623-656, July, October, 1948) and derives from a measure of entropy as defined in statistical mechanics. Zhang describes estimating a measure of entropy for packets in a network communication. Estimates of entropy for a communication exceeding a threshold are identified as ‘high entropy’ and an identification of a high entropy flow contributes to a detection of encrypted malicious traffic.
Zhang is premised on the detection of high entropy flows as an indicator of malicious network traffic. Encrypted network traffic also arises as part of non-malicious applications such as traffic from genuine and/or authorized software applications being encrypted for security purposes. It is therefore problematic to detect and respond to high entropy flows where non-malicious traffic may be falsely identified as potentially malicious.
The paper “Detecting Subverted Cryptographic Protocols by Entropy Checking” (J. Olivain and J. Goubault-Larrecq, 2006) describes an approach to detecting attacks based on computing entropy for a flow. The approach of Olivain et al. is directed to the detection of unscrambled traffic over cryptographic protocols as a way of detecting potentially malicious traffic. In particular, Olivain et al. observe how a measure of entropy for encrypted network traffic will tend towards the entropy of a random source such that, where network traffic consists of characters as bytes from an alphabet of 256 bytes, the entropy of encrypted network traffic tends towards 8 bits per byte. On this basis, Olivain et al. propose an approach to malicious traffic detection based on ranges of acceptable measures of entropy tending towards the entropy of a random source such that traffic that does not tend consistently in this way is identified as being unscrambled and malicious. Olivain et al. acknowledge the considerable drawback of their technique that it can be countered by malicious traffic that is itself encrypted. This is because encrypted malicious traffic will also exhibit entropy tending towards the entropy of a random source and so becomes indistinguishable from non-malicious encrypted traffic.
Bestuzhev highlights how malware can be communicated in encrypted form causing existing automatic malware detection systems to function incorrectly (Bestuzhev, 2010, www.securelist.com/en/blog/208193235/Steganography_or_encryption_in_bankers, retrieved February 2014). Such encrypted malware would also fail to be detected by the approach of Olivain et al which relies on the communication of unscrambled (e.g. unencrypted) traffic for detection.
Thus there is a need to address the considerable disadvantages of the known techniques to provide for the detection of encrypted malicious traffic.