Prior art computer systems have assorted problems with data movement into and out of cache. When these problems occur system performance is drastically curtailed.
One problem is cache bank conflicts which relates to the number of stores to cache that can be executed in a clock cycle. Many prior art computer systems allow a maximum of one load and one store to be executed per clock cycle. To perform a store, data is moved from a processor register to a store queue and then to data cache. If the store queue fills up, the processor stalls and system performance is reduced.
Some prior art systems also have an implementation constraint where a store immediately following a load to the same cache bank incurs additional delay as compared to any other combination of loads and stores to the cache banks. Prior art commonly interleaves loads and stores so the store queue cannot empty to cache as efficiently as desired causing performance degradation.
If the alternative loads and stores are used where the loads and stores are on the same cache line, a cache miss will occur on every load and store, and thus overlapping cannot be used. A cache miss occurs when the data to be loaded is not in the cache, but is in the main memory.
Another problem is cache collision. A cache collision occurs when two memory addresses map to the same cache address. Typically, a cache has a much smaller address space than the full memory, such that there are many to one mapping, i.e. many real memory addresses map to the same cache address. For example, if there are 1 million entries in the cache and 100 million entries in the main memory, then 100 main memory entries map to every cache entry. A typical mapping for a direct mapped cache may be memory addresses 1, 1 million+1, 2 million+1, etc., all map to cache address 1. Memory addresses 2, 1 million+2, 2 million+2, etc. all map to cache address 2. Therefore, when a data copy is performed where the source and destination addresses are exactly a multiple of the cache size apart, cache collisions will occur. Cache thrashing is where repeated cache collisions occur. There are two important special cases where cache thrashing occurs. The first is where two Unix processes, which share a parent process, are attempting to communicate by means of Unix supported shared memory. The process allocates shared memory for similar purposes at similar times. Thus, their data buffers for copying often have the same logical addresses, and map to the same cache lines. The second case occurs when two large arrays of data that are used in a computation are declared together, and the user declared them to be a multiple of the cache size. Then similar indexes in the arrays will map to the same cache lines. In other words, if array size is declared to be exactly one million, and the array is going to be copied to another array, also of size one million, then the array will systematically copy on to the same cache address. The effects of a cache collision is the loss of data, as the data is overwritten before the data is finished being used by the system. Moreover, a typical protocol will involve a mix of loads and stores for the two arrays. Each access for either array will wipe out several pieces of information for the other array. Moreover, the parallel operations mean that an exact address match does not have to occur before a cache collision occurs, the addresses can be close to each other and a cache collision will still occur. Thus, when data is to be copied and the source and destination address are nearly the same, but not exactly the same, performance is still reduced by cache collisions. Prior solutions do not make special provisions for handling nearly aligned source and destination addresses.
Thus there is a need in the prior art for a mechanism which maintains the store queue nearly empty, while minimizing both cache bank conflicts and cache collisions.