In data processing applications in such areas as image processing, three-dimensional modeling, fluid dynamics, video compression/decompression and arithmetic cellular automata, it is commonplace to work with large amounts of data organized into three-dimensional (3D) arrays. Data organized into such a data structure is then accessible using a triplet of indices that specify a single cell of that array. In working with such data, it is commonplace to perform 3D stencil calculations in which data from each cell and one or more neighboring cells in three dimensions are employed as inputs to a per-cell stencil calculation that is convolved about the cells of the 3D array.
Inefficiencies in accessing the data of the neighboring cells can arise due to the manner in which data of a 3D array is typically stored in a storage. This arises from a common tendency to store data of the cells of a 3D array in a row-column-plane manner in which data of cells that are adjacent to each other in a row are stored in contiguous storage locations in a storage such that they are addressable at adjacent addresses, but data of cells that are adjacent to each other in a column or in other planes are not stored in contiguous storage locations. Where the amount of data in a 3D array is such that the 3D array cannot be stored entirely within a single page of storage locations, the data of an adjacent cell of an adjacent plane may be stored in a storage location within a different page of storage locations. As those familiar with virtual addressing will readily recognize, transitioning from accessing data stored in a storage location of one page to accessing data stored in a storage location of another page can cause the incursion of a considerable time delay compared to accessing data stored at another storage location in the same page.
In virtual addressing, address translations between virtual and physical addresses must be retrieved from a page table as part of retrieving data and/or executable instructions. If the page from which the retrieval is to occur is stored in a relatively fast storage device and if its address translation is cached in a translation look-aside buffer (TLB), then delays incurred in retrieving data and/or instructions from a different page can be greatly minimized. However, where a page has not been accessed sufficiently recently that its associated address translation is not in a TLB (e.g., its address translation has been evicted from the TLB due to the limited number of storage locations of the TLB) and/or the page has been moved to a slower storage device, then delays incurred in retrieving data and/or executable instructions from that page can be considerable. Further, even where pages from which data and/or executable instructions are to be retrieved remain stored on a relatively faster storage device, delays to retrieve address translations from a page table may be repetitively incurred where a routine repetitively accesses many different pages such that address translations for each of those pages are repeatedly evicted from being cached in the TLB.