1. Technical Field
The present invention is directed to an apparatus and method for compressing pseudo-random data.
2. Description of Related Art
Data compression is generally achieved by exploiting the repetition of data. Many data sources, such as music, video, text, and the like, generally have enough repetition of data such that data compression has become a widely used practice to reduce storage and bandwidth requirements. However, many other forms of data, such as binary representations of data structures, binary executables, simulation data, and the like, do not contain much repetition. In addition, data files that have already been compressed by conventional techniques that exploit pattern repetition are not amenable to further compression. Thus, due to this pseudo-randomness of the data, very little compression of the data is obtainable.
It would therefore be beneficial to have an apparatus and method for compressing pseudo-random data that is not compressible using standard conventional compression techniques.
The present invention provides an apparatus and method for compressing pseudo-random data. The apparatus and method of the present invention make use of stochastic distribution models to generate approximations of the input data. A data sequence obtained from the stochastic distribution models is compared to the input data sequence to generate a difference data sequence. The difference data sequence tends to be less xe2x80x9crandomxe2x80x9d than the input data sequence and is thus, a candidate for compression using pattern repetition. The difference data sequence is compressed using standard compression techniques and stored as a compressed data file along with information identifying the stochastic distribution model used and any parameters of the stochastic distribution model, including seed value and the like.
When decompressing a data file compressed in the manner described above, the compressed difference data sequence is decompressed and a data sequence is generated using the identified stochastic distribution model and model parameters. The data sequence generated is then added to the difference data sequence to generate the original input data sequence. Other features and advantages of the present invention will be described in, or will become apparent to those of ordinary skill in the art in view of, the following detailed description of the preferred embodiment.