1. Field of the Invention
This invention relates to methods, systems and computer program products for detecting security threats in a computer network.
2. Background Art
Below is a list of publications related to the present invention and referenced herein:    [1] Ad-Aware, http://www.lavasoftusa.com/software/adaware/, 2004    [2] D. Barbara, R. Goel, and S. Jajodia. Mining Malicious Data Corruption with Hidden Markov Models. 16th Annual IFIP WG 11.3 Working Conference on Data and Application Security, July 2002.    [3] P. Barford, A. Bestavros, A. Bradley, and M. Crovella, Changes in Web client access patterns: Characteristics and caching implications, BU Computer Science Technical Report, BUCS-TR-1998-023, 1998.    [4] J. Berman, Prepared Statement of Jerry Berman, President, the Center For Democracy & Technology On the SPY BLOCK Act, Before the Senate Committee On Commerce, Science, And Transportation Subcommittee on Communication, March 2004.    [5] BlackICE PC Protection, http://blackice.iss.net/, 2004.    [6] CERT Vulnerability Note VN-98.07, http://www.cert.org/vulnotes/VN-98.07.backorifice.html, October 1998.    [7] CERT Advisory CA-2003-22 Multiple Vulnerabilities in Microsoft Internet Explorer, http://www.cert.org/advisories/CA-2003-22.html, August 2003.    [8] B. Cheswick, An Evening with Berferd in which a cracker is Lured, Endured, and Studied, USENIX proceedings, January 1990.    [9] D. E. Denning, An Intrusion Detection Model. IEEE Transactions on Software Engineering, 13(2):222-232, February 1987.    [10] B. Duska, D. Marwood, and M. J. Feeley, The measured access characteristics of World Wide Web client proxy caches, Proc. of USENIX Symposium on Internet Technology and Systems, December 1997.    [11] A. Dyatlov, Firepass, http://www.gray-world.net/pr_firepass.shtml, 2004.    [12] A. Dyatlov, S. Castro, Wsh ‘Web Shell’, http://www.grayworld.net/pr_wsh.shtml, 2004.    [13] EyeOnSecurity, http://eyeonsecurity.org/advisories/Gator/, 2002.    [14] R. Fielding, J. Gettys, J. C. Mogul, H. Frystyk, L. Masinter, P. Leach and T. Berners-Lee. Hypertext Transfer Protocol HTTP/1.1, RFC 2616, June 1999.    [15] S. Forrest, A. Hofmeyr, A. Somayaji, and T. A. Longstaff, A Sense of Self for Unix Processes, Proc. of the IEEE Symposium on Security and Privacy, pp. 120-128, May 1996.    [16] A. K. Ghosh, J. Wanken, and F. Charron. Detecting Anomalous and Unknown Intrusions Against Programs. Proc. of the Annual Computer Security Applications Conference (ACSAC '98), pp. 259-267, December 1998.    [17] S. Hisao, Tiny HTTP Proxy, http://mail.python.org/pipermail/python-list/2003-June/168957.html, June 2003.    [18] Hopster, http://www.hopster.com/, 2004.    [19] H. S. Javitz and A. Valdes. The SRI IDES Statistical Anomaly Detector, Proc. of the IEEE Symposium on Security and Privacy, May 1991.    [20] T. Kelly, Thin-client Web access patterns: Measurements from a cache-busting proxy, Computer Communications, 25(4):357-366, March 2002.    [21] C. Kruegel, T. Toth, and E. Kirda. Service-specific Anomaly Detection for Network Intrusion Detection. Symposium on Applied Computing (SAC), ACM Scientific Press, March 2002.    [22] C. Kruegel and G. Vigna, Anomaly Detection of Web-based Attacks, Proceedings of ACM CCS '03, pp. 251-261, 2003.    [23] T. Lane and C. E. Brodley, Temporal sequence learning and data reduction for anomaly detection, Proc. of the 5th ACM Conference on Computer and Communications Security, pp. 150-158, 1998.    [24] J. McHugh, “Covert Channel Analysis”, Handbook for the computer Security Certification of Trusted Systems, 1995.    [25] MIMEsweeper, http://www.mimesweeper.com/products/msw/msw_web/default.aspx, 2004.    [26] I. S. Moskowitz and M. H. Kang, Covert channels—Here to stay?, Proc. of COMPASS '94, pp. 235-243, 1994.    [27] V. Paxson. Bro: A System for Detecting Network Intruders in Real-Time. Proc. of the 7th Usenix Security Symposium, January 1998.    [28] V. Paxson and S. Floyd, “Wide-Area Traffic: The Failure of Poisson Modeling,” IEEE/ACM Transactions on Networking, 3(3), pp. 226-244, June 1995.    [29] F. A. P. Petitcolas, R. J. Anderson, and M. G. Kuhn, Information hiding—A survey, Proceedings of the IEEE, special issue on protection of multimedia content, 87(7):1062-1078, July 1999.    [30] S. Saroiu, S. D. Gribble, and H. M. Levy, Measurement and Analysis of Spyware in a University Environment, Proc. of the First Symposium on Networked Systems Design and Implementation, pp. 141-153, March 2004.    [31] M. Roesch. Snort—Lightweight Intrusion Detection for Networks. Proc. of the USENIX LISA '99 Conference, November 1999.    [32] Spybot—Search and Destroy, http://www.safer-networking.org/, 2004.    [33] SpywareBlaster, http://www.javacoolsoftware.com/spywareblaster.html/, 2004.    [34] K. Tan and R. Maxion. Why 6? Defining the Operational Limits of Stide, an Anomaly-Based Intrusion Detector. Proc. of the IEEE Symposium on Security and Privacy, pp. 188-202, May 2002.    [35] Websense, http://www.websense.com/products/about/howitworks/index.cfm, 2004.    [36] N. Ye, Y. Zhang, and C. M. Borror. Robustness of Markov chain model for cyber attack detection. IEEE Transactions on Reliability, 52(3), September 2003.    [37] Y. Zhang, V. Paxson, “Detecting Backdoors”, Proc. of the 9th USENIX Security Symposium, August 2000.
Network security has been an increasing concern for network administrators and executives alike. Consequently, firewalls and proxy servers have become prevalent among high-security networks (and even private homes). Many networks require all traffic to the internet to go through an HTTP proxy server or mail server, allowing no direct access to the internal network. This makes the job of a hacker much more difficult than before, where direct access to network machines was available.
When a hacker attacks a network with no direct access to the internet, the first step is getting a user to access a malicious file or website. This can be done effectively by e-mailing a Trojan horse program or a link to a page which exploits the browser [7]. Once the machine is compromised, the next step is to establish a path of communication. Traditionally, this would be done by installing a backdoor program such as BackOrifice [6]. The problem with using such programs on firewalled networks is that they listen for an incoming connection on a specific port. All incoming traffic, however, is blocked. This means that the only way to communicate with a compromised machine is to have it make a callback (outbound connection). Often, the only two ways out of the network are through a mail server or through a proxy server. Since e-mail is often more closely logged and filtered, the hacker may find outbound HTTP transactions to be the best avenue for communication with a compromised workstation.
Spyware is also a huge problem for both system administrators and users alike [4]. Besides annoying users by popping up advertisements, spyware can leak information about a user's behavior or even send data on the machine to outside servers. Spyware programs can also degrade system performance and take valuable time and effort to remove. In addition to these lesser threats, security holes have been found in Gator and eZula (two popular spyware programs) that would allow a hacker to execute arbitrary code on a target machine [13,30].
Signature analysis is a commonly used technique to look for Trojan programs and to do intrusion detection. For example, Snort [31] is configured with over 2500 signature rules to detect scans and attacks. Several commercial programs detect and remove spyware from computers by using the same principle and looking for spyware program signatures [1, 32, 33]. One limitation of signature analysis techniques is that new attacks are developed frequently that existing signatures may fail to detect. For that reason, signature analysis techniques should be complemented with anomaly detection techniques.
Tracking sequences of events using Markov chains or other models has been used for host and network intrusion detection [9, 15, 16, 23, 36]. This approach is very effective for many situations such as analysis of system call traces [15] to detect tampering of applications on a system after. Anomaly detection has also been used to detect network attacks [32, 36] and attacks on web servers [22].
In [22], the focus is on detecting malicious incoming traffic to a server by building a probabilistic profile of web application parameters exported by the web server.
Zhang and Paxson describe a method for detecting backdoors [37]. They look at the timing of packet arrivals and packet sizes in order to characterize an interactive shell. For delay times, they exploited the observation that keystroke inter-arrival periods follow a Pareto distribution with a high variance [27]. For packet sizes, they excluded connections that did not contain a large enough percentage of small requests. The interactive shell component of a backdoor program controlled by a remote hacker will not send requests when the hacker types them; the backdoor server has to wait for a callback from the client before sending any data. Instead of following a Pareto distribution, the delay times will follow a distribution according to whatever algorithm the backdoor client uses to schedule callback times.
Significant research exists on human browsing patterns for proxy cache and web server performance optimizations [3, 10, 20].
A substantial body of work exists on covert channel analysis, including detection of covert channels between processes or users on the same machine [24, 26]. A report by McHugh [24] defines a covert channel as “A mechanism that can be used to transfer information from one user of a system to another using means not intended for this purpose by the system developers.” Examples include manipulating CPU usage or disk utilization to communicate information. There is nothing inherently secret about HTTP transactions; they are designed to allow the exchange of information. Backdoors, however, hide data within the noise of legitimate web traffic in order to talk to their owners. These lines of communication are covert even though the channel is not. For this reason the data paths used by backdoors to secretly send information in legitimate web traffic will be referred to as tunnels.
It is also possible to prevent some HTTP tunnel activity by deploying a content-filter at the proxy server [25, 35]. Such a filter can be used to prevent people from accessing any website not on an approved list. Besides being a very restrictive policy for many organizations, this will not stop the operation of all backdoors. A well-designed tunnel could still take advantage of web e-mail via an approved site to communicate to its host. A hacker could also compromise a web server on the list of approved sites and use it for communication. If the hacker is able to place a CGI script on one of these servers, the tunnel can communicate with the script to leak information.
HTTP Tunnels
In general, if a protocol is available for communication, people have found ways to tunnel other protocols through it, bypassing any firewall restrictions based on protocols or communication ports. HTTP is no exception. Several programs provide HTTP tunneling to allow users within an organization to access non-HTTP services via HTTP proxies. One such program, Wsh [12], communicates over HTTP and provides file transfer capability as well as a remote shell from machines inside a protected network to remote servers. The program can also encrypt data if desired. Another one, Firepass [11], creates a tunnel between a client process and a remote service running on an arbitrary machine and port.
Backdoor Programs
While HTTP tunnel programs can be convenient at times for allowing legitimate users to bypass firewalls and get access to remote services, they can also present a serious security threat. The scenario presented herein is a modest extension of such a program that would allow a remote user to acquire a shell on a machine behind the firewall.
To get a better of idea how a backdoor could work, here is a model of an intrusion using such a program:    1) The hacker sends a Trojan horse program to the user, or the user to views a malicious site which exploits the browser [7]. (Much like how spyware programs can be installed.)    2) The payload of the hacker's program contains a backdoor that executes on the remote machine.
Once the backdoor program is running on the remote machine, the hacker needs some way of communicating with it. In this model, the network either has firewall rules in place to block all incoming traffic, or uses a proxy server. If the network uses a firewall, then it also blocks all outgoing packets except HTTP (TCP port 80) and DNS (UDP port 53).
After the backdoor has been installed, it calls back to a web server controlled by the hacker (or a server hosting a script written by the hacker) using HTTP requests. Callbacks can be scheduled according to a fixed-wait timer, a random-wait timer, or times of actual browsing activity. Due to the nature of HTTP protocol, all transactions must be initiated by the client computer. The threat model assumes that the hacker may make an effort to disguise messages as legitimate HTTP transactions. The communication protocol for the backdoor can hide outbound information in any header field (including the URL), or in data trailing the header for a POST request. The backdoor then receives commands from the hacker hidden within what appears to be a normal web page. There are many clever ways of hiding data [29], and it could be fruitless to try to detect them all.
Spyware
For spyware, the threat model is exactly the same except the initial mode of compromise is different. Spyware programs often install themselves by piggybacking on legitimate software, exploiting browser vulnerabilities, or tricking a user into downloading them voluntarily [4, 30]. Once they are installed, they can use the same method of communication as the backdoor program described above.
Every internet browser has a unique header signature and utilizes a certain set of header fields.
Since most HTTP requests are small, normal web browsing activity rarely utilizes much outbound bandwidth. When a hacker is using HTTP requests for covert communication, outbound bandwidth usage is expected to be higher than the norm. The reason for this is that the hacker usually only sends short requests and small tools (executables) inbound to the computer. Outbound bandwidth, however, is needed to download sensitive documents and directory listings. From a secrecy point of view, a system administrator should be more worried about outbound than inbound traffic.
Since most hackers need a lot of outbound bandwidth, but have little available to them (without being detected), they are made to spread their requests out over a long period of time. Legitimate web traffic, on the other hand, typically occurs in short bursts.
The following U.S. patent documents are related to the present invention:                U.S. Pat. Nos. 6,519,703; 6,671,811; 6,681,331; 6,772,345; 6,708,212; 6,801,940; 2002/0133586; 2002/0035628; 2003/0212903; 2003/0004688; 2003/0051026; 2003/0159070; 2004/0034794; 2003/0236652; 2004/0221191; 2004/0114519; 2004/0250124; 2004/0250134; 2004/0054925; 2005/0033989; 2005/0044406; 2005/0021740; 2005/0108393; and 2005/0076236.        