Data compression schemes are known to compress data held in the memory of a computer system. These schemes increase the effective capacity of the memory.
Computer systems often employ a hierarchical arrangement of memory levels in which smaller capacity but faster memory is located closer to a processor, whereas larger capacity but slower memory is provided at lower, more distant levels. For example, one such arrangement includes three memory levels in order of decreasing distance from the processor: storage (e.g., a hard disk), main memory (e.g., RAM) and cache memory. Additional cache memory levels can also be included. For example, in a two-level cache arrangement, a so-called L1 cache can be provided in between a processor and an L2 cache. Such an arrangement would include four memory levels in total. Where the processor registers are considered as a level of memory, then there would be five memory levels in this example.
In a hierarchical memory, data compression can be used between two levels of memory, to increase the effective capacity of the memory level which is more distant from the processor. For example, compression of uncompressed data from a first level memory of a computer system can be effected for storage in a second level memory of the computer system. Similarly, decompression of compressed data in a second level memory of a computer system can be effected for storage in a first level memory of the computer system.
Thus, compression can be used between memory levels, for example between a cache and a main memory. When data is written to the memory level which is more distant from the processor (also known as the lower memory level), a data compression scheme can be applied such that the data is stored in the more distant memory element in compressed form. Conversely, when data is read from the lower memory level in compressed form, the data compression scheme can be applied (hereinafter referred to as the data decompression scheme, although it will be understood that the data decompression scheme is normally just the reverse application of the data compression scheme) to decompress the data for entry into a higher memory level, which is less distant from the processor.
Memory in computer systems is normally arranged in a plurality of words. For example, a cache can comprise a plurality of cache lines, or cache blocks. Each cache line, or cache block, can typically store one or more data words. In many memory protocols, data is retrieved and written into a cache memory on a block-by-block basis. Similar considerations apply to main memory and storage. When a data compression scheme is applied, it can be applied on a block-by-block and/or word-by-word basis.
Various compression algorithms have been described in the literature. For example, a compression algorithm known as the LZ77 (also known as LZ1) algorithm is described in an article by Ziv J., and Lempel A, entitled “A Universal Algorithm for Sequential Data Compression”, IEEE Transactions on Information Theory, Vol. 23, No. 3, pages 337-343, 1977.
Storer J. A. and Szymanski T. G. described an algorithm known as the LZSS algorithm in “Data Compression via Textual Substitution”, Journal of the ACM. Vol. 29, pages 928-951, 1982. This is a derivative of the LZ77 algorithm and is known to be robust when the input data is fairly small and thus should be well suited for compressing individual cache blocks. This was shown to be the case in a paper by Alameldeen A. R. and Wood D. A. entitled “Frequent Pattern Compression: A Significance-Based Compression Scheme for L2 Caches”, Technical Report 1500, Computer Sciences Department, University of Wisconsin-Madison, April 2004. Briefly, this algorithm uses a sliding window and uses bytes as a symbol size. Each symbol in the input is either represented by a literal (the byte itself) or by a reference to the previously decompressed symbols added with information about how many symbols that matches. Thus, if a repeated pattern is found, it can be very efficiently encoded. Two examples of such repetitive patterns in in-memory data are pointers to nearby locations where the most significant bits of each pointer typically have the same value, and small positive and negative integer values where the most significant bits are all zeros or all ones. A drawback with LZSS, however, is that it is inherently serial in the sense that a byte cannot be uncompressed unless all prior bytes have been decompressed. Thus, the latency is fairly high.
The so-called Frequent Pattern Compression (FPC) algorithm was proposed by Alameldeen and Wood as a fast algorithm for cache compression in their paper previously referred to, namely Alameldeen A. R. and Wood D. A. “Frequent Pattern Compression: A Significance-Based Compression Scheme for L2 Caches”, Technical Report 1500, Computer Sciences Department, University of Wisconsin-Madison, April 2004. FPC divides a cache block into 32 bit words, which are decompressed in parallel. Each 32 bit word is represented by a three bit prefix followed by between four and 32 bits, e.g. an integer that can be represented with eight bits is encoded by the prefix followed by the eight bits for the integer. The decoder recognizes the prefix and adds three zeroed bytes in front of the eight bits to recreate the data. The main aim of FPC is to remove consecutive zero bits and consecutive one bits by encoding the so called frequent patterns. The patterns used are 4-bit sign extended (SE), single byte SE, half word SE, half word padded with zero half word, two half words each consisting of sign-ext ended byte patterns. FPC also has a special prefix to be able to encode several consecutive zero words efficiently. One of the prefixes is used to represent repeated bytes within a 32 bit word, e.g. Oxfefefefe. FPC is much faster than LZSS since it does not suffer from the dependencies between bytes. On the other hand it cannot exploit the value locality in the example with nearby pointers in the last section.
FIG. 1 represents a method of compressing uncompressed data as described in U.S. patent application Ser. No. 11/251,257. In this compression method, which starts as 10, the uncompressed data is assumed to comprise a plurality of data words, the data words comprising a plurality of data groups Gjk, wherein k denotes the kth data group in the jth data word. The method comprises: applying 12 a transform to produce a transformed plurality of data words, the transform being of the form Gjk→Gkj; and applying 14 a data compression scheme to each data word in the plurality of transformed data words, the process ending at 16.
FIG. 2 represents a method of compressing uncompressed data as described in U.S. patent application Ser. No. 11/251,257. In this decompression method, which starts at 20, the compressed data is decompressed at 22 to generate decompressed data words comprising a plurality of data groups Gkj, wherein j denotes the jth data group in the kth decompressed data word; and a transform is applied at 24 to produce a transformed plurality of data words, the transform being of the form Gkj→Gjk. The process ends at 26. As set out in U.S. patent application Ser. No. 11/251,257, by applying a simple transform (transposing the matrix), the efficiency of FPC can be improved in at least some circumstances.
The present invention seeks to improve yet further the efficiency of prior compression schemes.