The preferred embodiments relate to digital data storage and retrieval and, more particularly, to memory compression of such data.
Digital data systems include memory into which data is written and from which data is read. A single system may have access to multiple memories for various reasons, where one common approach is the use of multiple levels of cache memory. For example, a processor may access several cache memories, typically described as levels and labeled with the letter “L” followed by an integer, where the lowest level L1 is typically fastest to access, followed by L2, then possibly L3 and so forth. For these and other memories, the availability of space, access time, and competition of resources involves various efficiency considerations, and one such consideration involves what is referred to as memory compression.
Memory compression often arises where it is desired to sample less than, or otherwise reduce the size of, an entire data quantity that is read from a first memory, where a smaller data counterpart, such as a sampled (e.g., truncated) portion of the original, is then written into a second, destination memory so that multiple of the original (larger) data counterparts are thereby “compressed” by fitting them into smaller memory space in the second memory. Solely by way of a numeric example and for sake of later discussion, assume for example that data is provided from a first memory (e.g., L1 cache) in 32-bit quantities, but only 18 bits of each quantity are relevant to an analysis; each 18 bit subset, or representation, of a 32-bit quantity represents a compressed data “sample,” and assume further it is desired to compress multiple 18-bit samples into a 128-bit wide destination memory (e.g., L2 or L3 cache). As a result, up to seven 18-bit data samples may be combined into a total of 126 bits, and those 126 bits of compressed data are stored into a single 128-bit memory location in the destination memory.
In the prior art, compressing original data into samples is typically achieved by receiving incoming data samples into a temporary buffer that is the size of a memory line (or row) in the destination memory. When the temporary buffer is filled with the maximum number of compressed samples that it can hold, the entire buffer is written into a memory row in the destination memory. In the example above, therefore, a 128 bit temporary buffer is used. As an example of its operation, therefore, assume seven 18-bit data samples arrive, so each is stored in the temporary buffer, providing a total of 126 bits. When the eighth 18-bit sample arrives, the remaining portion of the 128-bit temporary buffer is filled, that is, two least significant bits (LSBs) from the eighth data sample are also input to the temporary data buffer, and the 128-bits then stored in the temporary data buffer are written to the destination memory row, thereby compressing the seven data samples, along with two bits from the eighth data sample, into one memory row (or “word”) of the destination memory. Note also in this example that, for the eighth data sample, 2 of its bits were written, while 16 of its bits remain unwritten. These remaining 16 bits, therefore, are next stored in the 128-bit temporary buffer, which will then await receipt of 112 next bits (i.e., from six of the next 18-bit data samples, along with 4 bits from a seventh such sample), and when it is filled again, another write into the next sequential address of the destination memory is performed, and the process repeats as needed or desired for additional compressed data samples. Given the preceding, note therefore, that the temporary buffer requires a “history” of data, that is, a delay while data from samples are received into the buffer. Once this history is achieved, the write to memory occurs. Moreover, the above process repeats for each set of data samples, and the results are always written into sequential memory addresses. Lastly, note that the prior art also uses a temporary buffer in the reverse, or decompression, data path. In a comparable manner, therefore, compressed data values are written and decompressed into the buffer, after which the decompressed values are written to sequential addresses in another memory (e.g., the source memory from which data was sampled for earlier compression).
While the above prior art approach has proven workable in various systems and architectures, the present inventors have observed various drawbacks. As one example, the prior art approach is necessarily constrained to writing to successive addresses in the destination memory. Such a result may have limitations in applications where non-sequential compression is desired, that is, writing into memory locations that are not contiguous. For example, in some applications, data from one source is required to be transposed into its destination, such as reading in row order and storing in column order; the prior art, therefore, cannot accommodate memory compression in such an application. As another example, the prior art requires a necessary delay as the buffer accumulates the “history” of multiple data samples.
Given the preceding, the present inventors seek to improve upon the prior art, as further detailed below.