A computer system comprises one or a plurality of processors, a computer memory system, and an I/O system. Any of the plurality of processors can execute instructions of which some instructions can do arithmetic/logic operations, some can do branches and yet other instructions can access a computer memory system. Instructions that access a computer memory can load data from said computer memory at a particular location—load instructions—and store data in computer memory at a particular location—store instructions. To load data from a particular location, a load instruction comprises a location identifier (sometimes called memory address) that designates the location in the computer memory from which the data value is loaded. Analogously, to store data in a particular location in computer memory, a store instruction comprises a location identifier that designates in which location in the computer memory the data value accompanying the store instruction is stored.
A computer memory comprises a linear array of memory locations that each comprises a memory word that can be 32 bits wide although other widths are possible. In a computer system employing a single level of computer memory, the plurality of processors connected to that single level of memory can all access and modify the value in any memory location by issuing a location identifier and can perform load and store instructions as explained above. Since the number of locations that is needed by computer tasks can be large, say several billions of memory locations, using a single level of memory may result in a slow access to each memory location. For that reason computer systems may use multiple levels of memory such that the number of memory locations that can be hosted in a level closer to one or a plurality of processors is typically fewer and can be accessed faster compared to a level further away from one or a plurality of processors.
Concretely, and by way of example, in a two-level memory system all memory locations that a computer program may need to access can be stored at the level furthest away from the processor—level 2—and the level closest to one or a plurality of processors—level 1—can contain at any time a subset of the ones at level 2. Typically, when a processor issues a load or a store instruction the level 1 memory is first accessed. First when a copy of the accessed memory location is not available at that level, the next level (level 2) is accessed which in this example can deliver the data value. It is well known for someone skilled in the art that such a two-level memory system can be generalized to any number of levels. There are many other possibilities in prior art to manage a two, or in general an n-level memory system. For example, a level may comprise a cache connected to each of one or a plurality of processors, whereas a next level comprises a cache shared by a plurality of processors.
In general, an arbitrary level of computer memory comprises a number of memory locations that can be accessed by the plurality of processors that it serves. A certain memory location can be accessed in that level of memory by having a processor issuing a location identifier (or memory address) to said level. That level of computer memory can use a hash function to access one of the locations in the linear array of memory locations. We refer to this conventional way of organizing a computer memory location-wise as a location-centric computer memory.
For the sake of discussion, let us assume that N distinct memory locations accessed by a processor contain the same value. Then in a location-centric computer memory the same value may occupy N locations and the redundancy in data values is N. If one could store a distinct value in a single location regardless of how many memory locations contain that same value, one could make use of memory resources more efficiently.
In the field of loss-less data compression, techniques exist that can store redundant values in computer memory more efficiently than in a conventional location-centric memory. For example, in dictionary-based compression techniques, all values stored in the locations in a computer memory are encoded in a dictionary and the encoding for the value stored in a particular location is stored in that location instead of the real value. Assuming that a computer memory stores N-bit words, it encodes as many as 2N distinct values. If 2M distinct values are stored in the computer memory, where M<N, one encoding of these 2M values would occupy only M bits instead of N bits. In the value-centric cache design (Zhang, 2000), a select set of distinct values is predetermined in an off-line profiling pass to encode frequently used redundant values densely. Since the predetermined set is limited, the compression achieved is also limited as values that are redundant but not members of the frequently used value set will use N bits rather than M.
In the well known Huffman compression algorithm substantially denser encodings can be found by taking advantage of the fact that some values are more common than others. The basic idea of Huffman coding is that given a set of symbols (an alphabet), symbols are assigned to variable-length codes according to their frequency of occurrence. A symbol can act as a reference to a value. And therefore, instead of representing all values with codes of the same width, narrower codes can be assigned to more frequent values, while wider codes to less frequent ones, thus decreasing the total size of a specific sequence of values that normally form a cache line or a memory line or even a memory page substantially. Huffman coding can assign codes to the values according to a specific binary tree, which is constructed bottom-up and left-to-right according to the frequency of occurrence of the symbols or their probabilities. The tree can be binary, meaning two nodes per parent node, quaternary, or in general N-ary depending on how many child nodes each parent node has. The structure of the tree, however, determines the depth, and is an important consideration for the processing.
In the following, by way of example and without loss of generality, we consider Huffman coding as an exemplary approach in the field of data compression using statistical-based compression in which the frequency of values is first established after which the coding used for compression is determined. There are three Huffman coding schemes known from prior art. First, in the static coding scheme the coding is created once at the beginning based on preprocessing of the frequencies of the values. Second, the semi-adaptive coding scheme does the coding in two passes. In the first pass, it calculates the probabilities, while in the second pass it constructs the coding and then compresses the object. Third, in the fully adaptive coding scheme the Huffman tree and therefore its coding is modified dynamically during compression. Using static Huffman coding, the compressibility is expected to be low unless the same values are used with the same frequency distribution during the whole execution of a task. The semi-adaptive Huffman coding scheme is simpler than the fully adaptive one but new values cannot be coded and therefore cannot be immediately compressed, thus requiring the Huffman tree and therefore the coding to be re-built. Rebuilding the coding can possibly impact the compressibility during the slack between the two tree constructions. On the other hand, fully adaptive Huffman coding is typically modified continuously, thus changing the codes of the values. However, it requires the to-be-compressed data to be accessed sequentially to be able to construct a de-compressor that is a mirror of the compressor. Using the fully adaptive scheme to compress data in storage/memory hierarchies can be less attractive due to the processing overhead in changing the codes continuously.
Let us now consider the specific application of statistical-based compression techniques to the field of computer memory systems. A way to apply statistical-based compression techniques to store redundant values denser in a location-centric computer memory is to create a dictionary of the encodings of the values in the computer memory in a first step. Then, in a second step, encode all values in the locations of the computer memory using the dictionary entries in a similar way as in other dictionary-based compression techniques.
Huffman-based compression of memory content has been used to compress computer instructions stored in memory (Larin, 2000) using the aforementioned static coding. The static approach yields a limited compressibility for data that tend to change during execution and there are many problems in applying compression techniques in general and statistical-based compression techniques in particular to store redundant data values in computer memory densely.
A first family of problems is the potential overhead encountered in accessing the computer memory. Assuming first that all encodings are of a fixed size, say M bits, as in (Zhang, 2000) and (Alameldeen, 2004; U.S. Pat. No. 7,412,564) a dictionary must be queried to translate a compressed word to an uncompressed value. This can make the access slower than in a location-centric memory. In the case encodings are allowed to have different sizes, such as in Huffman coding, locations in computer memory may also have different sizes which may complicate the mapping of location identifiers to “encoded locations” and can further make the access slower. (U.S. Pat. Nos. 7,642,935; 6,657,569) discloses apparatuses that can decode Huffman codes. However, the decoding operation may impose delays and overhead concerning power and real-estate area which may not make them applicable to computer memory systems.
A second family of problems pertains to the use of statistical-based compression techniques and in particular the overhead involved in using semi-adaptive schemes for computer memory data. How to collect statistics on data value frequency of occurrence accessed in computer memory on-line, as programs are being executed, change the encodings under execution and keep it off of the critical access path are problems that prior art have not addressed satisfactory.
In summary, statistical-based compression techniques known from prior art can suffer from significant overheads in the processes of collecting statistics, accessing or modifying values in the field of computer memories. While they can store redundant values densely, they can cause access overheads making them inapplicable as a means to more effective use of computer memory resources.