There are numerous applications such as image processing, Fast Fourier Transform (FFT), Partial Differential Equation (PDE) solvers, Lattice Boltzmann Method (LBM) fluid simulations, Reverse Time Migration (RTM) seismic imaging, and Quantum Chromo dynamics (QCD), where the performance is determined by fast access to multi-dimensional arrays. Long stride memory accesses are often required to access multi-dimensional arrays and the accesses can cause cache memory conflicts that degrade the performance of a cache memory. For example, incorrect hardware pre-fetches of the cache memory lines bring unnecessary data and memory bandwidth is wasted.
For example, FIG. 1 illustrates a prior art main memory 110 that stores a two dimensional array, i.e., six by eight array. To read the elements of the first column of the two dimensional array, i.e., elements 111, 121, 131, 141, 151, 161, 171 and 181, long stride memory accesses are required in order to obtain the data. The element 192 of the data cache memory 190 can cache any element of the first column of the two dimensional array. The cache memory conflict becomes more severe when there are multiple threads that share the same cache memory as they compete with each other for the usage of the cache memory.