The term "data" as used herein refers to symbols that can be transmitted or stored. A number of data symbols is referred to as a segment or block. The phrases "application of data to a medium" or "applying data to a medium" refer to the act of putting the data on a communications medium or mediums, or a storage medium or mediums. The application of data to a medium involves the act of generating physical signals (i.e. electrical, electromagnetic, light, or other) which are sent (for a communications medium) or stored (for a storage medium).
Data compression techniques are known in the art for improving the bandwidth of a communications medium and/or the capacity of a data storage medium or system. Such techniques convert data in a first given format into a second format having fewer bits than the original. Lossless data compression techniques include a decompression process that generates an exact replica of the original uncompressed, plaintext or clear data.
Whether stored on a storage medium or transferred over a communications medium between computer systems, data generally contains significant redundancy. Known data compression techniques reduce the redundancy content of the data such that it can be transmitted in less time over communications channels, or such that it will take up less space in a storage medium. Thus, data compression systems are particularly effective if the original data contains substantial redundancy.
Compression systems generally include a modeling method or model followed by a back end coder. To obtain better or higher compression ratio, a multi-step compression system may be used. Multi-step compression systems involve the use of a front end coder, with its associated modeling method, to compress the data prior to implementation of the back end coder, with its associated modeling method.
In a data compression system, the modeler learns the data as it is processed. The modeler will represent the knowledge in its internal variables or states. The coder efficiently represents the knowledge or information generated by the modeler to generate compressed data. The term "coder" as used herein refers either to an encoder, decoder, or both.
Data compression techniques are further described in U.S. patent application Ser. No. 08/609,129.
Whether data is transmitted or stored, it is susceptible to unauthorized observation. Security is becoming particularly difficult as computers are increasingly networked, thus increasing potential access to stored or transmitted confidential data. Known compression algorithms provide a small measure of security, as the compressed data must be deciphered before it can be understood. However, as known compression techniques are based on the reduction to redundancies, data compressed using such techniques is relatively easy to decipher.
To transmit or store data in a secure fashion, the data must first be encrypted. Known encryption techniques usually utilize algorithms that manipulate data as a function of randomly generated bits. Such techniques generally utilize block ciphers, stream ciphers, or other random number generators to introduce randomness into the encryption process.
A stream cipher, for example, outputs a randomly generated bit stream as a function of a seed such as an encryption key. The stream cipher outputs the same stream of bits if the same key is used. Generally, encryption techniques convert plain text to cipher text one bit at a time. The cipher text is obtained from the plain text by performing the mathematical exclusive OR (XOR) operation between the cipher bits and the plain text bits. In the decryption stage, the plain text is retrieved from the cipher text by XORing the bits of the cipher text with the stream cipher bits. The resulting system security is dependent on the design of the stream cipher: the more secure the stream cipher, the more secure the cipher text.
It is known in the art both to compress and to encrypt clear data. Known methods involving both the compression and encryption of data are "sequential", that is, they involve the two discrete steps of compression and then encryption. Note that it is preferred to perform the compression step before the encryption step for at least two reasons. First, by having the data compressed before the encryption step, the security of the encryptor is improved with respect to statistical cryptography attacks, which are based on redundancy. Second, well encrypted data cannot be effectively compressed using known compression techniques.
Prior art methods involving the sequential compression and encryption of data have been referred to as "concryption", or as a "single operation" involving at least one "compression step" and an "encryption step". See, for example, U.S. Pat. No. 5,479,512 (Weiss). Such prior art methods involve a compression step followed by an encryption step, or a compression step followed by an encryption step followed by another compression step. Such methods are therefore sequential compression and encryption methods.
Sequential compression and encryption is an effective method for achieving the goals of, firstly, improving the bandwidth of a communications medium or the capacity of a data storage system, and secondly, ensuring the security of the transmitted or stored data. However, sequential data compression and encryption is slow and computationally expensive because it involves two discrete operations: compression and encryption.
It is known in the art that some measure of security can be achieved simply by compressing data using a method involving an adaptive modeler and a coder without an encryption step. Such a technique was proposed by Witten et al. (see I. H. Witten and J. G. Cleary, "On the Privacy Offered by Adaptive Text Compression", Computers & Security, 7 (1988), pp. 397-408), who suggested that the adaptive nature of the modeler would act like an encryption key that would be difficult for an intruder to duplicate. Witten et al. suggested that for lower security applications, even a fixed or static modeler could be used.
In subsequent publications, however, Bergen et al. have shown that the security of both fixed and adaptive modelers can be highly undermined through a chosen plain text attack. One suggested solution to the low security associated with the compressed data was to XOR the compressed output of the coder with a random number generator. In other words, Bergen et al. suggested sequential data compression and encryption as a solution to the inadequate security achieved through the use of a compression step alone. Regarding fixed modelers, see H. A. Bergen and James M. Hogan, "Data security in a fixed-model arithmetic coding compression algorithm", Computers & Security, 11 (1992), pp. 445-461. Regarding adaptive modelers, see H. A. Bergen and James M. Hogan, "A chosen plaintext attack on an adaptive arithmetic coding compression algorithm", Computers & Security, 12 (1993), pp. 157-167.