Data compression systems, which encode a digital data stream into compressed digital code signals and which decode the compressed digital code signals back into the original data, are known in the prior art. The methods utilized in data compression systems serve to reduce the amount of storage space required to hold the digital information and/or result in a savings in the amount of time required to transmit a given amount of information. For example, the extensive transactional records accounted for by companies such as banks and telephone companies are often stored for archival purposes in massive computer databases. This storage space is conserved, resulting in a significant monetary savings, if the data is compressed prior to storage and decompressed from the stored compressed files for later use.
Various methods and systems are known in the art for compressing and subsequently reconstituting data. For example, a compression scheme used pervasively on the Internet today is “gzip,” designed by Jean-Loup Gailly. See “DEFLATE Compressed Data Format Specification version 1.3”, RFC 1951, Network Working Group May 1996; “GZIP file format specification version 4.3,” RFC 1952, Network Working Group, May 1996. Gzip utilizes a variation of the well-known LZ77 (Lempel-Ziv 1977) compression technique which replaces duplicated strings of bytes within a frame of a pre-defined distance with a pointer to the original string. Gzip also uses Huffman coding on the block of bytes and stores the Huffman code tree with the compressed data block. Gzip normally achieves a compression ratio of about 2:1 or 3:1, the compression ratio being the size of the clear text relative to the size of the compressed text.
Gzip is a popular but suboptimal compression scheme. Nevertheless, the inventors, while conducting experiments on compressing massive data sets of telephone call detail records, managed to achieve compression ratios of around 15:1 when using gzip. The substantial reduction in size effected by merely using a conventional compression technique such as gzip suggested to the inventors that additional improvements to the compression ratio could be devised by a careful analysis of the structure of the data itself.