Data compression is a well-established technique that is used to reduce the size of the data. It is applied to data that are saved in the memory subsystem of a computer system to increase the memory capacity. It is also used when data are transferred either between different subsystems within a computer system or in general when the transfer takes place between two points in a data communication system comprising a communication network.
Data compression requires two fundamental operations: 1) compression (also referred to as encoding) that takes as input uncompressed data and transform them to compressed data by replacing data values by respective codewords (also mentioned in the literature as encodings, codings or codes) and 2) decompression (also referred to as decoding) which takes as input compressed data and transform them to uncompressed by replacing the codewords with the respective data values. Data compression can be lossless or lossy depending on whether the actual data values after decompression are exactly the same to the original ones before being compressed (in lossless) or whether the data values after decompression are different than the original ones and the original values cannot be retrieved (in lossy). Compression and decompression can be implemented in software, or hardware, or a combination of software and hardware realizing the respective methods, devices and systems.
An example of a computer system 100 is depicted in FIG. 1. The computer system 100 comprises one or several processing units P1 . . . Pn connected to a memory hierarchy 110 using a communication means, e.g., an interconnection network. Each processing unit comprises a processor (or core) and can be a CPU (Central Processing Unit), a GPU (Graphics Processing Unit) or in general a block that performs computation. On the other hand, the memory hierarchy 110 constitutes the storage subsystem of the computer system 100 and comprises a cache memory 120, which can be organized in one or several levels L1-L3, and a memory 130 (a.k.a. primary memory). The memory 130 may also be connected to a secondary storage (e.g., a hard disk drive, a solid state drive, or a flash memory). The memory 130 can be organized in several levels, for example, a fast main memory (e.g., DDR) and a flash memory. The cache memory 120 in the current example comprises three levels, where the L1 and L2 are private caches as each of the processing units P1-Pn is connected to a dedicated L1/L2 cache, whereas the L3 is shared among all the processing units P1-Pn. Alternative examples can realize different cache hierarchies with more, fewer or even no cache levels, and with or without dedicating caches to be private or shared, various memory levels, with different number of processing units and in general different combinations between the processing units and the memory subsystem, as is all readily realized by a skilled person.
Data compression can be applied to a computer system in different ways. FIG. 2 depicts an example 200 of a computer system, like for instance system 100 of FIG. 1, where data are compressed in the memory, for example in the main memory of such computer system. This means that data are compressed before being saved in the memory by a respective compression operation as mentioned above, and data are decompressed when they leave the memory.
In an alternative example 300 of a computer system, shown in FIG. 3, data compression can be applied to the L3 cache of the cache system. Similarly to the previous example, compression is required before data are saved in the cache and decompression is required before data leave the cache (e.g., to other cache levels (L2) or to the memory 330 where data are uncompressed). In alternative examples data can be saved compressed in any level of the cache hierarchy.
Data can be also compressed only when they are transferred between different subsystems in the computer system. In the alternative example 400 of a computer system shown in FIG. 4, data are compressed when transferred between the L3 cache and the memory 430 using the respective communication means. Similarly to previous examples, compression and decompression need to exist in the ends of the communication means so that data are compressed before being transferred and decompressed when they are received at the other end.
In an alternative example 500 of a computer system, data compression can be applied in a combination of subsystems as depicted in FIG. 5. In this example, data are compressed when they are saved in the memory 530 and when they are transferred between the memory 530 and the cache hierarchy 520. In this way, when data are moved from the cache hierarchy 520 to the memory 530, they may only need to be compressed before being transferred from the L3 cache. Alternatively, the compressed data that leave the memory 530 to the cache hierarchy 520 may only need to be decompressed when they are received to the other end of the communication means that connect the memory 530 to the cache hierarchy 520. Regarding the combination of applying compression to the different subsystems in a computer system, any example is possible and can be realized by someone skilled in the art.
Transfer of data can also take place between two arbitrary points within a communication network. FIG. 6 depicts an example of a data communication system 600 comprising a communication network 605 between two points, where data are transferred by a transmitter 610 and received by a receiver 620. In such an example, these points can be two intermediate nodes in a network or the source and destination nodes of a communication link or a combination of these cases. Data compression can be applied to such a data communication system, as is depicted for an example system 700 in FIG. 7. Compression needs to be applied before data are transmitted by a transmitter 710 onto a communication network 705, while decompression needs to be applied after received by a receiver 720.
There is a variety of different algorithms (schemes) to realize data compression. One family of data compression algorithms are the statistical compression algorithms, which are data dependent and can offer compression efficiency close to entropy because they assign variable-length (referred to also as variable-width) codes based on the statistical properties of the data values: short codewords are used to encode data values that appear frequently and longer codewords encode data values that appear less frequently. Huffman encoding is a known statistical compression algorithm.
A known variation of Huffman encoding that is used to accelerate decompression is canonical Huffman encoding. Based on this, codewords have the numerical sequence property meaning that codewords of the same length are consecutive integer numbers.
Examples of canonical Huffman-based compression and decompression mechanisms are presented in prior art. Such compression and decompression mechanisms can be used in the aforementioned examples to realize Huffman-based compression and decompression.
An example of a compressor 900 from the prior art, which implements Huffman encoding e.g., canonical Huffman encoding, is illustrated in FIG. 9. It takes as input an uncompressed block, which is a stream of data values and comprises one or a plurality of data values generally denoted v1, v2, . . . , vn throughout this disclosure. The unit 910, which can be a storage unit or an extractor of data value out from the uncompressed block, supplies the Variable-length Encoding Unit 920 with data values. The Variable-length Encoding Unit 920 comprises the Code Table (CT) 922 and the codeword (CW) selector 928. The CT 922 is a table that can be implemented as a Look Up Table (LUT) or as a computer cache memory (of any arbitrary associativity) and contains one or a plurality of entries; each entry comprises a value 923 that can be compressed using a codeword, a CW 925 and a codeword-length (cL) 927. Because the set of the various codewords used by statistical compression algorithms is of variable-length, they must be padded with zeros when they are saved in the CT 922 where each entry has a fixed-size width (codeword 925). The codeword-length 927 keeps the actual length of the variable-length encoding (e.g., in bits). The CW selector 928 uses the cL in order to identify the actual CW and discard the padded zeros. The coded value is then concatenated to the rest of compressed values that altogether form the compressed block. An exemplary flow chart of a compression method that follows the compression steps as previously described is depicted in FIG. 11.
An example of a decompressor 1000 from the prior art is illustrated in FIG. 10. Canonical Huffman decompression can be divided into two steps: Codeword detection and Value retrieve. Each of these steps is implemented by a unit: (1) Codeword Detection Unit (CDU) 1020 and (2) Value Retrieve Unit (VRU) 1030. The aim of CDU 1020 is to find a valid codeword within a compressed sequence (i.e., the sequence of the codewords of the compressed data values). The CDU 1020 comprises a set of comparators 1022 and a priority encoder 1024. Each comparator 1022a,b,c compares each potential bit-sequence to a known codeword, which is in this example the First-assigned (at the time of code generation) canonical Huffman codeword (FCW) for a specific length. In alternative implementation, the last-assigned canonical Huffman codeword could be used too, but in that case the exact comparison made would be different. The maximum size of the aforementioned bit-sequence to be compared, which can be saved in a storage unit 1010 (implemented for example as a FIFO or flip flops) and which determines the number of comparators and the maximum width of the widest of them, depends on the maximum length of a valid Huffman codeword (mCL) that is decided at code generation. However, this maximum length can be bounded to a specific value at design, compile, configuration or run time depending on the chosen implementation of such decompressor (e.g., in software or in hardware). The output of the comparators 1022 is inserted into the priority encoder like structure 1024 which outputs the length of the matched codeword (referred to as “matched length” in FIG. 10). Based on this, the detected valid codeword (matched codeword) is extracted from the bit-sequence which is saved in a storage unit 1010; the bit sequence is shifted by as many positions as the “matched length” defines and the empty part is loaded with the next bits of the compressed sequence so that the CDU 1020 can determine the next valid codeword.
The Value Retrieve Unit (VRU) 1030, on the other hand, comprises the Offset table 1034, a subtractor unit 1036 and the Decompression Look Up Table (DeLUT) 1038. The “matched length” from the previous step is used to determine an offset value (saved in the Offset table 1034) that must be subtracted (1036) from the arithmetic value of the matched codeword, determined also in the previous step, to get the address of the DeLUT 1038 where the original data value that corresponds to the detected codeword can be retrieved from it and attached to the rest of decompressed values that are kept in the Decompressed block 1040. The operation of the decompressor is repeated until all the values that are saved compressed in the input compressed sequence (mentioned as compressed block in FIG. 10) are retrieved as uncompressed data values v1, v2, . . . , vn.
An exemplary flow chart of a decompression method that follows the decompression steps as previously described is depicted in FIG. 12.
The aforementioned compressor and decompressor can quickly and effectively compress blocks of data with variable-length Huffman encoding and decompress blocks of data that are compressed with variable-length Huffman encoding. Other compression schemes that comprise compressors and decompressors which implement other compression and decompression algorithms, such as delta-based, pattern-based, etc. can be also used.
Compression schemes like the aforementioned ones add inevitably latency and complexity due to the processes of compression and decompression. Compression and decompression are in the critical memory access path when compression is applied in the aforementioned cache or/and memory subsystems of an example computer system. Compression and decompression can also increase the transmission latency when compression and decompression are applied to the data transferring subsystem in a computer system or in a communication network.
Data are accessed and processed in chunks of particular sizes depending on the data types forming data values. Data values of certain data types exhibit often certain value locality properties. Prior art compression schemes try to exploit this by making a priori assumptions on what data types are the root cause of value locality to simplify the compression and decompression processes and keep compression and decompression latency low.
Value locality comprises two main notions: a) temporal value locality and b) spatial (or clustered) value locality. Temporal value locality stipulates that the same value appears often. Statistical compression algorithms are example compression algorithms that take advantage of this value locality notion. On the other hand, spatial value locality stipulates that values are numerically similar. Delta encoding is an example compression algorithm that exploits spatial value locality as it is meaningful to encode such values with their difference to a base value. Temporal value locality comprises also specific cases: i) zero-value locality: for example, in the cache subsystem and/or memory subsystem and/or data transferring subsystem in a computer system a data block may contain data values which are all zero values (referred to as null block data type), as depicted on the left of FIG. 13 and FIG. 14; ii) narrow-value locality: for example, when values in data block are narrow unsigned integer values, which belong to the range 0-255 but they need to be represented with 32 bits, all of them have 0 bits in their 24 most significant bits (referred to as narrow-value data type). Narrow value locality is exploited by Significance-based compression algorithms or pattern-based compression algorithms.
Statistical compression schemes can be considered a reasonable default strategy. However they do not always result in the highest compressibility. For example, integers that are moderately common are replaced by longer codewords than the most common ones using statistical compression schemes. If said integers are also spatially close, they can potentially be coded much denser using delta encoding instead. As data values in a data block are of different data types and/or exhibit different value locality properties, robustness in compressibility cannot be guaranteed as there is no compression scheme that can always perform better than others for various data types. The present inventors have realized that there is room for improvements in the technical field of data compression and decompression.