Date transmitted through a computer network, particularly a public network such as the Internet, is inherently vulnerable to pirating by illegitimate users. Misappropriation of data can occur in a variety of ways, such as through physically absconding with data storage media or surreptitiously gaining unauthorized access to the data over a network interface, i.e., hacking. As society continues to exhibit an ever-increasing reliance on computer systems, the protection of sensitive data will always be an important consideration.
A very common approach for protecting sensitive data is to encrypt the data through any of a variety of known data encryption techniques. Oftentimes, particularly if the data is to be transmitted through a computer network, encryption techniques are coupled with data compression techniques to reduce bandwidth requirements during transmission. While the goal of cryptography is to ensure that an unauthorized individual cannot gain access to the underlying data which has been encrypted, it can also be beneficial to ascertain whether or not a given file contains encrypted information. This could be useful, for example, by investigators or individuals who simply desire confirmation that their files have been encrypted as intended. When a file is encrypted, the encryption scheme randomizes the data resulting in a conversion of the plaintext into cyphertext. This can make it exceedingly difficult for someone to decipher the underlying information if not privy to the encryption algorithm.
To illustrate this randomization property, reference is initially made to FIGS. 1(a) and 1(b). FIG. 1(a) illustrates a histogram 2 of a non-encrypted file, such as a text file, where the x-axis represents the ASCII value corresponding to the various alphanumeric characters present in the file, and the y-axis illustrates the relative frequency of each of these alphanumeric characters. Since the non-encrypted file is a text file, it is not surprising that the most frequently occurring ASCII value is 32 which corresponds to the spacebar keystroke. In contrast, FIG. 1(b) illustrates a second histogram 12, having characteristics that one would expect after the same text file has been encrypted through an appropriate encryption scheme, such as PGP (Pretty Good Privacy). It can be seen that the histogram of FIG. 1(b) is much flatter because the encryption algorithm has randomized the data, such that the frequency distribution of the various characters is much more uniform after the file has been encrypted. It has been surprisingly found that the uniformity, or lack thereof (randomness), exhibited by the frequency distribution of byte values in a file (i.e. their histogram) is a characteristic which can be used to gain useful insight, and to actually make a determination within varying degrees of reliability, as to whether a given file has been encrypted. The present invention is particularly directed to making such assessments.