1. Field of the Invention
This invention relates to methods, systems and computer program products for comparing or measuring information content in at least one data stream including one or more data segments.
2. Background Art
The following references are cited herein:    [1] Adobe Systems Incorporated. Adobe Flash Player. http://www.macromedia.com/software/flash/about, 2008.    [2] R. Anderson and F. Petitcolas. On the Limits of Steganography. IEEE Journal of Selected Areas in Communications, 16(4):474-481, 1998.    [3] K. Borders and A. Prakash. Web Tap: Detecting Covert Web Traffic. In Proc. of the 11th ACM Conference on Computer and Communications Security (CCS), 2004.    [4] K. Borders and A. Prakash. Towards Quantification of Network-Based Information Leaks Via HTTP. In Proc. of the 3rd USENIX Workshop on Hot Topics in Security, 2008.    [5] S. Brand. DoD 5200.28-STD Department of Defense Trusted Computer System Evaluation Criteria (Orange Book). National Computer Security Center, 1985.    [6] S. Cabuk, C. Brodley, and C. Shields. IP Covert Timing Channels: Design and Detection. In Proc. of the 11th ACM Conference on Computer and Communications Security (CCS), 2004.    [7] S. Castro. How to Cook a Covert Channel. hakin9, http://www.gray-world.net/projects/cooking_channels/hakin9_cooking_channels_en.pdf, 2006.    [8] J. Gailly and M. Adler. The gzip Home Page. http://www.gzip.org/, 2008.    [9] J. Giles and B. Hajek. An Information-Theoretic and Game-Theoretic Study of Timing Channels. IEEE Transactions on Information Theory, 48:2455-2477, 2003.    [10] M. Handley, V. Paxson, and C. Kreibich. Network Intrusion Detection: Evasion, Traffic Normalization, and End-to-End Protocol Semantics. In Proc. of the 10th USENIX Security Symposium, 2001.    [11] M. Kang, I. Moskowitz, and D. Lee. A Network Version of the Pump. In Proc. of the 1995 IEEE Symposium in Security and Privacy, 1995.    [12] G. Malan, D. Watson, F. Jahanian, and P. Howell. Transport and Application Protocol Scrubbing. In Proc. of the IEEE INFOCOM 2000 Conference, 2000.    [13] S. McCamant and M. Ernst. Quantitative Information Flow as Network Flow Capacity. In Proc. of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2008.    [14] Mozilla. The Firefox Web Browser. http://www.mozilla.com/firefox/, 2008.    [15] Mozilla. SpiderMonkey (Javscript-C) Engine. http://www.mozilla.org/js/spidermonkey/, 2008.    [16] A. Myers, N. Nystrom, L. Zheng, and S. Zdancewic. Jif: Java information flow. http://www.cs.cornell.edu/jif, 2001.    [17] R. Richardson. CSI Computer Crime and Security Survey. http://i.cmpnet.com/v2.gocsi.com/pdf/CSISurvey2007.pdf, 2007.    [18] RSA Security, Inc. RSA Data Loss Prevention Suite. RSA Solution Brief, http://www.rsa.com/products/EDS/sb/DLPST_SB—1207-lowres.pdf, 2007.    [19] N. Schear, C. Kintana, Q. Zhang, and A. Vahdat. Glavlit: Preventing Exfiltration at Wire Speed. In Proc. of the 5th Workshop on Hot Topics in Networks (HotNets), 2006.    [20] J. Seward. bzip2 and libbzip2, version 1.0.5—A Program and Library for Data Compression. http://www.bzip.org/1.0.5/bzip2-manual-1.0.5.html, 2007.    [21] C. Shannon. Prediction and Entropy of Printed English. Bell System Technical Journal, 30:50-64, 1951.    [22] S. Servetto and M. Vetterli. Communication Using Phantoms: Covert Channels in the Internet. In Proc. of the IEEE International Symposium on Information Theory, 2001.    [23] Sun Microsystems. Java. http://www.java.com, 2008.    [24] VONTU. Data Loss Prevention, Confidential Data Protection—Protect Your Data Anywhere. http://www.vontu.com, 2008.    [25] R. Wagner and M. Fischer. The String-to-String Correction Problem. Journal of the ACM, 21(1):168-173, 1974.    [26] Websense, Inc. Web Security, Internet Filtering, and Internet Security Software. http://www.websense.com/global/en/, 2008.    [27] A. Yumerefendi, B. Mickle, and L. Cox. TightLip: Keeping applications from spilling the beans. In Proc. of the 4th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2007.
As the Internet grows and network bandwidth continues to increase, administrators are faced with the task of keeping confidential information from leaving their networks. Today's network traffic is so voluminous that manual inspection would be unreasonably expensive. In response, researchers have created data loss prevention systems that check outgoing traffic for known confidential information. These systems stop naïve adversaries from leaking data, but are fundamentally unable to identify encrypted or obfuscated information leaks. What remains is a high-capacity pipe for tunneling data to the Internet.
Network-based information leaks pose a serious threat to confidentiality. They are the primary means by which hackers extract data from compromised computers. The network can also serve as an avenue for insider leaks, which, according to a 2007 CSI/FBI survey, are the most prevalent security threat for organizations [17]. Because the volume of legitimate network traffic is so large, it is easy for attackers to blend in with normal activity, making leak prevention difficult. In one experiment, a single computer browsing a social networking site for 30 minutes generated over 1.3 MB of legitimate request data—the equivalent of about 195,000 credit card numbers. Manually analyzing network traffic for leaks would be unreasonably expensive and error-prone. Due to the heavy volume of normal traffic, limiting network traffic based on the raw byte count would only help stop large information leaks.
In response to the threat of network-based information leaks, researchers have developed data-loss prevention (DLP) systems [18, 24]. DLP systems work by searching through outbound network traffic for known sensitive information, such as credit card and social security numbers. Some systems even catalog sensitive documents and look for excerpts in outbound traffic. Although they are effective at stopping accidental and plain-text leaks, DLP systems are fundamentally unable to detect obfuscated information flows. They leave an open channel for leaking data to the Internet.