Modern society has come to rely quite heavily on electronic communication and computerized data storage and processing. Because of the volume and sensitivity of the data stored and communicated via electronic devices, users have sought to protect their communications and sensitive data from others who may wish to compromise this information either by physically accessing a computer or by intercepting wired or wireless network communications. One well-known method by which users protect their data and communications is through the use of encryption. Ideally, encryption should be used for legitimate purposes such as protecting sensitive data and private communications. However, there are an untold many who employ encryption to obfuscate their nefarious activities, such as the infiltration of a network infrastructure, to hide incriminating data, and to hide communications involving criminal activity, to name a few examples.
Because encryption is a well-known method of protecting or obfuscating communications and data, law enforcement and cryptanalysts know to look for encrypted data (also referred to as ciphertext) as an indicator of possibly useful information for thwarting attacks or investigating attacks that have already occurred. For these reasons and others it is useful to have an efficient method to detect and distinguish encrypted data from other types of data.
One simplified approach for distinguishing encrypted files from other file types is to read file headers, or in the case of network traffic, packet headers. For example, in regards to digital forensics, it is not uncommon for subjects to alter file extensions or even header information in hopes that particular data will be overlooked during a hard disk drive analysis. Unencrypted files will have discernible headers, which reveal their structure, whereas encrypted files will have indiscernible headers.
Unfortunately, in many cases the rudimentary analysis of merely looking at file headers does not prove fruitful because it is possible to obfuscate a file's content by changing the header information and/or the packet signature information. Thus, for example, an encrypted file could be manipulated to incorporate plaintext header information to indicate file data of a different type. While a naïve analyst might be deceived by such manipulation, a trained analyst would know to delve deeper. Moreover, in the case of a noisy network or with surveillance data, only portions of the data may be captured and therefore the header information might not be available for inspection.
Where the file headers do not exist, there is another known approach that may be used to particularly distinguish between encrypted and compressed files. This approach entails running a compression algorithm against the data. Encrypted data usually will compress to some degree, whereas use of an appropriate compression algorithm on already compressed data will usually cause the data to grow in size. Thus, this property of increasing file size upon compression can be used to distinguish between the two file types.
While this approach can prove quite useful, its primary limitation is that it relies on knowledge of the underlying compression algorithm that was used to generate the compressed data in the first place. Unless the same compression algorithm is used in the testing, the results can be indeterminate. Unfortunately, the underlying compression algorithm is often not known which can translate into a time consuming analysis and may frustrate investigative efforts. There accordingly remains a general need for a more robust approach for distinguishing between encrypted data and other data types, regardless whether the data of interest is part of a file or a data stream, and more particularly an approach which is capable of distinguishing between encrypted data and compressed data.
The foregoing examples of the related art and their related limitations are intended to be illustrative and not exclusive. Other limitations may become apparent to those practiced in the art upon a reading of the specification and a study of the drawings.