1. Field of the Invention
The present invention generally relates to n-body computations and more specifically to n-body computations using parallel computation systems.
2. Description of the Related Art
N-body systems are commonly used to simulate and model behaviors of interacting objects in complex systems. High-level behaviors for an N-body system may be simulated via N-body computations, which reproduce behaviors of individual bodies within the system. For example, attractive and repulsive forces, atomic masses, and atomic distances can be used to provide components of a model for simulating behaviors of a molecule comprising an N-body system of atoms.
Certain N-body computations proceed as a sequence of time steps, where state information for each time step is computed for an N-body system. The state information may include three-dimensional location and force information for each body within the N-body system. During each time step, a set of interactions is computed between each body within the system and other bodies within the system. The interactions are conventionally represented as an interaction matrix that includes a cell for each possible interaction. One type of interaction between bodies within the N-body system is force, and a force matrix may be used to represent individual forces between each body in the N-body system. For example, an N-by-N force matrix F(i,j) may be used to represent inter-atomic forces between N atoms, and each cell within the force matrix F(i,j) represents an individual inter-atomic force between atom i and atom j.
Because many useful N-body systems include a large number of bodies and extremely small simulation time steps, N-body simulation systems can be computationally very intensive and therefore potentially good candidates for execution on parallel computation platforms. One approach to partitioning an N-body simulation for execution on a parallel computation platform, such a graphics processing unit, involves dividing up the force matrix F into groups, such that each group is a sub-matrix of the force matrix. In each group, a number of rows for the sub-matrix may be determined by a characteristic number of threads configurable to execute together as one computational entity. A number of columns for the sub-matrix may be determined by an amount of data that can be read and stored efficiently in memory associated with the computational entity, such as a local register file. For each body within the group, one or more threads associated with the computational entity compute individual forces from the given body to each other body within the N-body system. Parallel computation platforms conventionally embody certain limitations with respect to threads accessing one or more tiers of memory. For example, two different threads may be able to simultaneously access two different blocks within a memory subsystem if the memory blocks are not aligned, but the threads may experience lower performance when accessing two aligned memory blocks because the aligned accesses result in access conflicts. When parallel threads generate access conflict conditions, the each access commonly needs to be executed sequentially rather than in parallel (with other access requests), leading to reduced efficiency and lower overall performance.
One problem in existing N-body simulation methodologies with respect to parallel computation platforms is that common access patterns to data within the force matrix result in access conflicts, leading to lower overall efficiency. In other words, existing methodologies do not fully utilize processing throughput from the parallel computation platform.
Accordingly, what is needed in the art is a more efficient N-body simulation methodology for parallel computation platforms.