1. Field of the Invention
This invention is related to the field of computer systems and, more particularly, to memory systems in computer systems.
2. Description of the Related Art
Traditionally, the cost, performance, and power consumption characteristics of computer systems have been driven by the processors included in the system. More recently, however, trends in processor design have begun to shift attention to the main memory system. For example, chip multithreading (CMT) is becoming a more popular processor design. A CMT processor includes hardware on a single chip to concurrently process multiple independent software threads. The processor hardware may be shared among the threads, improving efficiency in the use of the processor hardware. However, having multiple threads in execution increases the demands on the memory system. While threads may be related to the same overall application and may share memory, in general each thread may have its own memory locations that it is accessing for instruction fetches and data. Thus, for the same number of processor chips as a non-multithreaded processor, more memory may be required to provide reasonable performance. As another example, chip multiprocessing (CMP) is becoming popular in which multiple independent processors are included on the same processor chip. Again, memory requirements may increase on a per processor chip basis.
Generally speaking, a larger main memory system translates to increased costs. First, the number of memory chips is increased, which clearly drives the cost up. Second, to provide an appropriate level of performance with a larger memory size (in terms of latency and bandwidth, for example), expensive implementation techniques may be required. For example, multiple memory controllers coupled to sections of the overall main memory may be required, additional banks of memory coupled to a given memory controller may be required, etc. Third, providing low latency, high bandwidth access to a large main memory may result in increased power consumption of the memory system (as compared to traditionally-sized memory systems). The increased power consumption may increase cooling requirements and power supply requirements for the computer system, which may also increase cost.
In addition, while new multi-core CMT and CMP processor chips may promise a high compute bandwidth, the requirement that bandwidth in and out of the chip scale linearly with the number of cores may severely constrain future systems. Consequently, the off-chip bandwidth may ultimately constrain CMT/CMP chips as the cache capacity on-chip will likely not be able to scale with the number of cores. As a result, techniques that can reduce bandwidth needs across the chip boundary become important.
One approach to dealing with this bandwidth bottleneck due to cache misses is to increase the amount of on-chip cache. However, increasing the size of on-chip caches reduces the area which can be devoted to processing cores.
Another approach involves partitioning a cache block frame into sub-blocks and associating a valid bit with each of the sub-blocks. On a cache miss, only the missing sub-block is loaded into the cache. Other approaches have sought to keep track of valid bits at the word level to reduce coherence transactions caused by false sharing. However, such approaches do not predict future accesses and do not avoid future misses.
Another approach is to compress off-chip traffic in order to minimize the bandwidth required to perform writeback operations. While compression of the on-chip caches may improve performance, additional latencies may be introduced due to the decompression overhead which is required.
What is desired are methods and mechanisms that can effectively reduce bandwidth and reduce misses.