1. Field of the Invention
This invention relates generally to the field of data compression, and specifically to a multi-stage block-wise adaptive statistical data compressor that includes a lexicographical sorting stage.
2. Description of the Related Art
Data compression (or compression) refers to the process of transforming a data file or stream of data characters so that the number of bits needed to represent the transformed data is smaller than the number of bits needed to represent the original data. The reason that data files can be compressed is because of redundancy. The more redundant a particular file is, the more likely it is to be effectively compressed.
A known block-wise adaptive statistical data compressor adapts its data model on a block-by-block basis. The data model generated by the adaptive statistical data compressor consists of a plurality of super-character codewords that correspond to a plurality of super-character groups, wherein each super-character group contains data regarding the frequency of occurrence of one or more individual characters in an applicable character data set. The use of these super-character codewords and groups to model the data in a particular block minimizes the amount of model data that must be included with the compressed data block to enable decompression.
A multi-stage data compressor that includes the block-wise adaptive statistical data compressor as one stage is also known, and includes a clustering stage and a reordering stage, which, together, reformat data in a data block so that the frequency distribution of characters in the data block has an expected skew. This skew is then exploited by selecting certain super-character groupings that optimize the compression ratio achievable by the block-wise adaptive statistical stage. The clustering stage in such a multi-stage compressor may be implemented using the Burrows-Wheeler Transform (“BWT”). A BWT clustering stage includes a stage of using a known generic sorting algorithm, such as the radix sort or the quick sort, to sort cyclic shifts. These sorting algorithms are generic in that they can sort any data, and are not specifically suited to sorting cyclic shifts.