A particular method of data compression, known as block-sorting compression or Burrows-Wheeler compression, operates by sorting all rotations of elements in a data block, selecting an element from each rotation based on its location in the rotated block and compressing a set of these elements using a compression mechanism. In terms of computational efficiency, this method is reasonably fast and often generates smaller compressed outputs than other techniques. Typically, the compression ratio (i.e., the ratio of the size of the original data to the size of the compressed output) achieved by block-sorting compression is increased when large data block sizes are compressed. Block-sorting compression is, therefore, often used to compress large data blocks.
In decompressing the compressed data, the sorting needs to be reversed. Unfortunately, amount of memory required to decode the large data block sizes is typically larger than the available cache memory in a computer system. For instance, block sizes of 200 KB to 4 MB are common, requiring in-memory data structures of about 1.2 MB to 16 MB for decoding, while cache memory of 512 KB to 1 MB is typical. As a consequence, a particular lookup operation during decoding often results in cache misses. For very large blocks of data, the cache miss rate may substantially exceed 50 percent. Since this lookup operation is executed many times during decoding, the resulting cache misses degrade the overall computational efficiency of the decompression operation.
There is a need, therefore, for a technique of decompressing block-sorted data with improved computational efficiency.