In general, main memory access is relatively slow compared to central processing unit (CPU) execution times. Therefore, most CPU architectures include one or more caches. A cache is a high-speed memory which can be associated with a small subset of referenced main memory. Because most memory reference patterns only require a small subset of the main memory contents, a relatively smaller, high-speed cache can service many of the memory references.
For example, instruction caches can improve efficiency because often in software programs a small section of code may be looping. By having the instructions in a high-speed, local instruction cache, they are accessed much faster. Data caches can likewise improve efficiency because data access tends to follow the principle of locality of reference. Requiring each access to go to the slower main memory would be costly. The situation can be even worse in a multi-processor environment where several CPUs may contend for a common bus.
Data cache systems in some configurations comprise both a data store and a tag array. The data store holds data copied from the main memory. Each tag array location holds a tag, or physical page address, for a block of consecutive data held in the data store in association with the tag location.
During a memory access, a virtual page address from the CPU core is translated by a page translator into a physical page address. The remainder of the address, or a portion thereof, is used to index into the tag array. The tag retrieved from the indexed tag array is compared with the translated physical page address, a match indicating that the referenced data is in the data store; a mismatch indicates that the data will have to be retrieved from main memory. Page translation occurs in parallel with the tag array lookup, minimizing delay.
A need also exists in multiprocessor systems to test the contents of the data cache system from outside the CPU. Several processors may reference the same physical address in memory. Besides looking up its own local cache, each CPU must check the caches of other CPUs in the system. Failure to do so would result in data incoherency between the individual caches as each CPU reads and writes to its own local copy of the same data from main memory.
To prevent this incoherency, a CPU sends "probes" to other CPUs during a memory reference. Each data cache system receiving a probe uses a physical address provided by the probe to look into its own tag array. If the data resides in its data store, the data cache system responds to the probing CPU accordingly allowing ownership arbitration to take place.
A problem with caches is that they are susceptible to reference patterns in which memory references collide in such a way that the entire cache is not utilized, e.g. where two memory addresses are referenced which have different page addresses but the same index value. Due to the common index, each memory reference will cause different data to be loaded to the same cache location, negating any beneficial effect of the cache. Unfortunately, these reference patterns, also known as "power-of-two stride" patterns, are somewhat common in many important software applications.
Set associative caches partially solve this problem by having more than one storage location for each index value, although they incur the additional cost of multiple port lookups into the cache tag and data arrays and additional hardware to decide in which of the locations to store a tag. For example, in a 2-way set associative cache, for each index value there are two possible locations into which data can be loaded. Thus it is not necessary to write over the previously loaded data. Of course, this does not fully resolve the problem if the power-of-two stride pattern comprises three or more colliding addresses.
Another method for dealing with the power-of-two stride problem hashes addresses into different locations such that collisions generated by 2.sup.m (or close to 2.sup.m) reference patterns, for some integer m, are minimized. For example, U.S. Pat. No. 5,509,135 (Steely), "Multi-Index Multi-Way Set-Associative Cache", uses a different hashing function for each of the ways within a set. In another implementation targeted for direct-mapped caches, U.S. Pat. No. 5,530,958 (Agarwal), "Cache Memory System and Method With Multiple Hashing Functions and Hash Control Storage", a first hashing function is applied to create a cache index. If this results in a cache miss, a second hashing function is then applied, resulting in a different index, and so on.