The inherently high parallelism of modern processors, such as GPUs (graphical processing units) has led to a significant increase in power dissipation, thereby necessitating expensive cooling solutions. In addition, general purpose processing on such specialized architectures poses new problems yet opens avenues for power optimizations at the architectural level. Data compression is a promising technique to decrease on-chip and off-chip bandwidth usage and reduce power dissipation. If the amount of data being driven onto busses can be reduced, it can translate into a proportionate savings in power. A significant portion of system power is used to drive data on cache and memory busses. These busses transfer cache line data between adjacent levels of the memory hierarchy. Each of these transactions requires multiple cycles to complete a transfer, each cycle consuming power and taking time. By compressing data, the smaller data size can result in a shorter transaction, and therefore less power is needed to transmit the data.
Memory busses can benefit significantly from data compression because they are off-chip busses that consume more power per transfer cycle than on-chip busses. Because memory transfers have a relatively high latency, the latencies in the act of compressing and decompressing data in hardware are more easily hidden. Data compression for software is generally designed to be implemented in software and to compress long streams of data because it relies on matching patterns to previously-seen or known patterns. Of significant interest, however, are compression techniques that can compress relatively small units of data, such as a 64-byte cache line, and can be implemented in hardware with low-latency and low power overhead. For example, there have been a number of studies on the compressibility of cache and memory data, for the purpose of increasing memory utilization. Such compression techniques could be used instead of, or in addition to, reducing the size of bus transactions specifically to reduce power.
Many applications use floating point numbers, which themselves are typically not easily compressed due to the random distribution of common patterns. Lossy compression techniques have been used to reduce the size of datasets in such floating point applications. One popular mechanism that is presently known is to round the least significant bits (LSBs) of the mantissa to zeros. Such compression techniques are usually acceptable since the LSBs of floating point numbers in most applications are not particularly useful as such high precision is not often needed. However, existing compression algorithms are not designed to match the LSBs among different floating point numbers. As a result, the compression algorithm must be modified to be aware of these rounded bits or be aware of the odd alignments of bits. The modifications, however, can impact overall compressibility, because the compression algorithm has to be aware of more patterns.
What is desired, therefore, is a mechanism that is adaptive to rounded bits in a compression process and that exposes more matching patterns in order to improve the compressibility of floating point numbers.
What is further desired is a mechanism of improving the compression ratio of floating point numbers to reduce power consumption in cache lines and cache compression hardware.
The subject matter discussed in the background section should not be assumed to be prior art merely as a result of its mention in the background section. Similarly, a problem mentioned in the background section or associated with the subject matter of the background section should not be assumed to have been previously recognized in the prior art. The subject matter in the background section merely represents different approaches. For example, embodiments may be directed to applications related to bus compression, however described bit mapping methods can be used for any lossy compression method in hardware design. Some examples include bus, cache, and memory, and such a method can be used to save energy, or to reduce the size of cache lines or memory blocks.