1. Field of the Invention
The present invention relates in general to unstructured grid cell ordering, and, in particular, to a system and method for determining a cache optimized ordering of cells in an unstructured graph.
2. Related Art
Certain classes of problems, particularly simulations in physics and engineering, operate on large numeric data sets requiring the decomposition of a physical space or the division of a model into a grid or mesh (hereinafter “grid”) of individual quadrilateral elements or cells (hereinafter “cells”) expressed in a multi-dimensional problem space. For example, calculating the stress on a piece of paper requires the modeling of the paper surface as a grid of individual cells with each cell storing a load factor and other data. The stress calculated at any given point in the grid depends on the load factors stored in the cells adjacent to the load factor stored in the cell being measured.
Physically, the contents of the cells are stored in serial order in either structured memory arrays with cells organized into repeating patterns occurring at regular intervals or unstructured memory arrays with irregularly-spaced cells occurring aperiodically. Nearest neighbor and related types of operations performed on a data set represented by an unstructured memory array generally perform poorly when executed in a default ordering.
Providing efficient solutions to solving problems requiring physically contiguous data access between elements is strongly dependent on processor and memory access speed. Presently, increases in processor speed have greatly outpaced increases in memory access speed. The disparity between processor versus memory access speed has created a performance bottleneck, which has progressively worsened as the disparity widened. Recently, computer systems have trended towards a heavier reliance on hierarchically-structured memory architectures to alleviate the memory access bottleneck problem.
Fundamentally, caches transiently stage data and instructions in a limited-sized but high-speed memory array. Each cache stores a subset of the data currently loaded into the main memory. Due to the limited cache capacity, individual cache words must be replaced when necessary to accommodate other subsets of the main memory that require access.
Data elements stored in unstructured data sets generally lack a regular and periodic pattern of organization. Typically, individual cells are assigned to elements of a memory array in order of traversal through the unstructured data set. The ordering of memory array elements storing cells from an unstructured data set can be critical to ensuring optimal cache performance and an arbitrary ordering of cells can destroy any spatial or temporal locality. For example, spatially local, non-physically contiguous cells could be assigned to memory array elements that, when loaded into one or more caches, span multiple, disparate cache lines and result in poor cache performance. To alleviate the problem of arbitrary, non-physically contiguous ordering, the memory array elements can be reordered to improve data locality prior to execution of problem-solving code.
In the prior art, the two main classes of reordering programs use either a breadth first search or a space filling curve approach, such as described in A. George et al., “Computer Solution of Large Sparse Positive Definitive Systems,” Comp. Math., Prentice-Hall (1981), the disclosure of which is incorporated by reference. However, the breadth first search approach does not require that cells within an iteration of a front or level set be contiguous and, consequently, near-diagonal lines are not formed in an adjacency matrix of the cells. As well, the space filling curve approach does not try to establish a regular access pattern between physically adjacent cells, which appear on cache lines.
Yet a further approach to providing improved cache data locality through memory array element reordering is described in I. Al-Furaih and S. Ranka, “Memory Hierarchy Management for Iterative Graph Structures,” Proceedings of the 1998 International Parallel Processing Symposium/Symposium on Parallel and Distributed Processing, (1998), pages 298-302, the disclosure of which is incorporated by reference. Algorithms for obtaining a mapping table are surveyed and two methods for reordering two iterative graphs structures are provided. In the first method, independent reordering, each of two graphs are reordered independent of each other based on the interaction between nodes. In the second method, coupled reordering, each graph is reordered using self-interactions and interactions with other subgraphs. Although empirically increasing overall performance, the approach fails to provide an efficient reordering of nearest neighboring cells from unstructured data sets for optimization of hierarchical cache structures.
Yet a further prior art approach to providing improved data cache locality is described in W. Z. Hu et al., “Improving Fine-Grained Irregular Shared-Memory Benchmarks by Data Reordering,” Proceedings, Supercomputing 2000, Nov. 4-10, 2000, IEEE Computer Society, ISBN 0-7803-9802-5 , 2000 the disclosure of which is incorporated by reference. Two data reordering methods are provided, each consisting of two phases. During the first phase, a sorting key is constructed for each object and the keys are sorted to generate a rank. During the second phase, the objects are reordered according to generated rank. Although empirically increasing overall performance, the approach fails to provide an efficient reordering of nearest neighboring cells from unstructured data sets for optimization of hierarchical cache structures.
Therefore, there is a need for an approach to providing effective reordering of memory array elements storing data values from data object cells in an unstructured graph, such as a numeric data set, for improving hierarchical cache performance and to provide increased spatial and temporal data locality.
There is a further need for an approach to providing an efficient organization of data cells from a structured or unstructured grid stored in a memory array to efficiently solve nearest neighbor type problems without requiring modification of the underlying operations.