1. Field of the Invention
The present invention relates to data processing and efficient use of cache space systems, and more particularly to systems where each processor has at least one level of cache memory.
2. Description of the Related Art
Caches hierarchies are used in data processing systems to reduce the latency and bandwidth in accessing memory. Caches are effective because of the temporal and spatial locality that exists in the memory reference streams. Caches exploit temporal locality by keeping a local copy of recently accessed data. Caches exploit spatial locality by fetching and storing more data than is required to service a single cache access. This unit of storage stored in the cache is called a block or line. Lines and blocks will be used interchangeably, hereinafter. Typically, a block can be 128 or 256 bytes. Logically, any memory access is either serviced by the first level cache or initiates a transfer of at least one block from the next level in the memory hierarchy to the first level cache.
A block can be divided into one or more equal sized partitions called sub-blocks. The size of each sub-block is at least large enough to satisfy a single cache access. For example, a block of 256 bytes can be divided in 8 sub-blocks each of 32 bytes. When there is low spatial locality, a block includes sub-blocks that are stored in the cache and are never used. These unused sub-blocks consume cache space without contributing to cache hits. This reduces cache efficiency and degrades system performance.
Significant work has been done, both in the industry and in academia, with a focus on using cache space efficiently. Some of the relevant work is described below.
U.S. Pat. No. 6,735,673 to Kever discloses an apparatus and methods for cache line compression. When blocks of data are stored in compressed form, more blocks can fit into the cache. This increases the probability of a cache hit. Cache compression is dependent only on the values stored in the block and not on the spatial locality of the block. This implies that a compressed block still contains data for unused sub-blocks.
U.S. Pat. No. 6,516,388 to McCormick et al. describes a method and apparatus for reducing cache pollution. Prefetched blocks do not update the least recently used (LRU) stack on installation. This reduces pollution caused by unused prefetched blocks.
U.S. Pat. No. 5,577,227 to Finell et al. discloses a method to decrease the number of stall cycles resulting from a cache miss in a multilevel cache system. On a cache miss, the requested block is fetched. If there are any invalid blocks in the cache as a result of consistency protocol, additional blocks are prefetched along with the requested block.
U.S. Pat. No. 4,774,654 to Pomerene et al., U.S. Pat. No. 6,535,961 to Kumar et al, and U.S. Pat. No. 6,557,080 to Burger et al., disclose a method to prefetch sub-blocks from low speed memory to high speed memory depending upon the state of sub-block reference bits or the outcome of a spatial footprint predictor.
U.S. Pat. No. 5,539,894 to Webber describes a mechanism for optimizing the tag storage identifier used in a computer system. Upon initial power-on of the computer system, the amount of system memory is determined and a minimum number of sub-blocks for the cache memory is selected such that when maximum system memory is installed, fewer sub-blocks are selected for the cache memory.
Splitting a cache based on reference locality is described in an article by Gonzalez et al. titled “A data cache with multiple caching strategies tuned to different types of locality” published in the International Conference on Supercomputing, 1995. Their design consists of a spatial cache, a temporal cache, and a predictor history table. Depending on the prediction history table, the fetched block is either placed in the spatial cache or the temporal cache. This scheme is targeted toward numerical codes that have very predictable spatial locality characteristics.
Johnson et al. in “Spatial Locality Detection and Optimization”, published in IEEE-Micro 1997, describe a method to predict spatial locality of the incoming block. Their design consisted of a table called Memory Address Table (MAT) to track spatial locality. MAT controls the number of blocks fetched in case of a cache miss. This scheme requires that the cache be organized using a very small block size and a separate spatial locality prediction structure be accessed before the incoming block is fetched from memory.
Choosing the line size for a cache is one of the fundamental decisions a designer makes in the design of a cache. Large cache line sizes can prefetch nearby data and avoid misses that a smaller line must incur. However, not all of the data in a large line gets referenced by the processor and cache pollution will result. Workload analysis has shown that many database applications use less than 50% of a 256 byte line when brought into the cache. Additionally, large lines require more bus cycles to transfer a line into the cache than a smaller line. This can result in bus queueing during periods of high miss rates.
Thus, it is desirable to design a cache that can have the advantages of both a large line (to prefetch nearby misses) and, when appropriate, a small line that can be transfer quickly and avoid cache pollution by only keeping the referenced information in the cache. By allowing the cache to include a high percentage of useful information (less pollution) misses are avoided and performance is increased.