In order to make computer generated graphics look more realistic, advanced particle simulation may be used to model the behavior of particles, for example using Smoothed Particle Hydrodynamics (SPH) to model the flow of liquids. In order to solve SPH, and many other simulations, it is necessary to iterate over all particles in the simulation and for each particle, to find all other particles within a specified distance (the support radius, h) of that particle. This process is known as a ‘nearest neighbor search’. Solution of SPH then involves iterating over all the nearest neighbors of each particle and summing density and interaction forces between them as an approximation to a set of integrals.
The simulation may involve tens of thousands of particles (e.g. 20,000 particles) and therefore the nearest neighbor search is computationally intensive. Where the graphics are being generated for use in a film or television program, these calculations do not need to be carried out in real time or close to real time. However, for computer games it is necessary to perform these calculations in a short time interval to provide substantially real time updates (typically 30 simulations a second or faster).
In order to simplify the nearest neighbor search, the search may be performed in two passes. First a broad phase allocates particles 101 to bins 102, as shown in a simple 2D example in FIG. 1. If the length of each side of a bin is h (the support radius), then when performing a nearest neighbor search, only those particles in the same bin and neighboring bins (e.g. 9 bins in total for a 2D example, 27 bins for a 3D example) need to be considered. As two particles in neighboring bins may be further apart than the support radius (e.g. particles 103 and 104), a second phase, known as the narrow phase, calculates the distance between the particle being considered and particles in the neighboring bins and discards those where their separation is greater than the support radius.
The hardware on which such calculations must be performed (i.e. PCs and games consoles) often comprise both a central processing unit (CPU) and a graphics processing unit (GPU). Whilst the CPU is designed to have general processing capability, the GPU has a highly parallel structure and is designed specifically to perform graphics operations, including rendering polygons and texture mapping. Recent GPUs include programmable stages known as shaders: a vertex shader and a pixel shader. The vertex shader is used to modify the vertices of a polygon, e.g. moving a vertex to change the shape of an object. The pixel (or fragment) shader is used to change the appearance of a pixel (i.e. its color) based on parameters such as lighting, shading etc. By performing the graphics operations in dedicated hardware (i.e. the GPU) rather than in the CPU, the operations can be performed much more quickly. However, as the GPU is not designed for general use, it is not flexible like the CPU and has a number of limitations, including that it has little or no ability to perform scattered write operations (i.e. writing of data to random or scattered memory locations). A severe limitation of the GPU is that data written by the GPU is not generally immediately available to be read back due to a separation between input and output structures.
In order to speed up the nearest neighbor search, techniques have been proposed to enable some of the operations to be performed on the GPU to leverage its parallel processing capability. In an example, the CPU may be used to sort the data whilst the GPU uses the results to perform the simulation itself. However, the passing of data between the CPU and the GPU is prone to latency issues, with the GPU being left idle whilst the CPU completes the sort and also bottlenecks may arise as data is transferred between the CPU and GPU.