Artificial intelligence (AI) is generally considered to be one of the key components of a computer game. Sometimes when we play a game, we may wish that the computer opponents were written better. At those times while playing against the computer, we feel that the game is unbalanced. Perhaps the computer player has been given different set of rules, or uses the same rules, but has more resources (health, weapons, etc.). The complexity of underlying AI systems, along with game design, belies the resulting feeling we have when playing any game. As the CPU and GPU speed and power continues to grow, along with increasing memory amounts and bandwidth, game developers are constantly improving the graphics of their games. In the last five years the production quality of games has been increasing (along with the corresponding budgets). Recent games woo players with incredible breakthroughs in real-time 3D graphics, complexity of the worlds and characters, as well as various post-processing effects. And while there had been tremendous improvements for parallelizing rendering through the evolution of consumer GPU pipelines, artificial intelligence computations are treading behind. To date, there had been rather few attempts at parallelizing AI computations.
Typically, in a game, AI controls the behavior of non-player-characters (NPC), whether they are friendly to the player or act as game opponents. This may include actual characters, or it can simply be tanks and armies (such as in a real-time strategy game), or monsters in a first-person shooter. The uniform feeling is that the better the AI is, the better the game. A more sophisticated AI system allows for more interesting and fun gameplay. Artificial intelligence is used for various parts of the game. Typical computations include path finding, obstacle avoidance, and decisions making. These calculations are needed regardless of the genre of interactive entertainment, be it a real-time strategy game, an MMORPG, or a first-person shooter. It may soon happen that dynamic character-centric entertainment in the form of interactive movies will evolve, where the viewer will have control over the outcome.
In many scenarios, the AI computations include dynamic path finding. This involves auto-simulating characters' behavior, and/or running a terrain analysis to identify good or update valid paths as result of gameplay. These computations can be quite a hog on CPU time budget, even in multi-core scenarios. As a result, many game developers are looking for ways to minimize the CPU hit of pathfinding. Because path finding and AI in general is such a compute-intensive, expensive calculation, we often see boring, zombie-like NPC interaction. Furthermore, when gameplay and physics are simulated on the CPU, and the characters are rendered on the GPU, there is an additional PCI-E data transfer overhead for character positions and state. It would be desirable to utilize the GPU for running in-game AI code to speed up path finding, and introducing a number of other interesting effects. Characters can start living on their own, resulting in so-called “emergent behaviors”—such as lane formation, queuing, and reactions to other characters and so on. And this means that game play will be a lot more fun.
Many applications require that an array of unsorted point data be sorted into spatial bins prior to being processed. For example, particle system simulations using the discrete element method (DEM) [Bell et al. 2005, Particle-based simulation of granular materials, In SCA '05: Proceedings of the 2005 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, ACM, New York, N.Y., USA, 77-86; Harada 2007, Real-time rigid body simulation on gpus, In GPU Gems 3, H. Nguyen, Ed. Addison-Wesley, Upper Saddle River, N.J., USA, ch. 29 (hereinafter “Harada 2007”).] require a nearest-neighbor search to apply particle-to-particle repulsive forces. It is important to use a spatial data structure to accelerate nearest-neighbor searches, as a brute-force search on n elements will require an expensive O(n) search per element. By partitioning the particles into spatial bins, the search can be limited to nearby particles, which dramatically reduces its computational cost.
In a GPU-based simulation, constructing these data structures on the GPU is necessary to maintain high performance. If these data structures are to be built by the CPU, particle positions must be transferred out of graphics memory into system memory, and the resulting data structure must be transferred in the opposite direction. In addition to consuming precious bus bandwidth, these kinds of hybrid GPU/CPU approaches require synchronization between GPU and CPU, which reduces utilization by introducing stalls.
Various previous approaches to spatial sorting on the GPU exist. [Purcell et al. 2003, Photon mapping on programmable graphics hardware, In Proceedings of the ACM SIGRAPH/EUROGRAPHICS Conference on Graphics Hardware, Eurographics Association, 41-50] present two methods for sorting point data into grid cells on the GPU as part of their GPU-based photon mapping technique. Their first method sorts points by grid cell ID using a bitonic merge sort. This results in a sorted array in which points in the same grid cell are listed consecutively. A binary search step constructs a lookup table that contains array offsets for quickly finding each grid cell's data in the sorted array. As an optimization to the bitonic merge sort, the authors describe a method they call stencil routing for storing points in grid cells.
Stencil routing is a multi-pass algorithm that scatters point data into grid cells using the vertex shader. When a point lands in a grid cell, the stencil value associated with the cell is incremented to prevent additional points from being written to the cell. This ensures that, if multiple points map to the same grid cell, they will not overwrite each other. A depth test prevents the same point from being stored in a cell multiple times. For the depth test to function correctly, stencil routing requires that its input data be in sorted order. Stencil routing must iterate over the entire data set once for each storage location within a cell (loop count is equal to maximum cell capacity).
Amada et al. implements a GPU particle system that constructs a nearest-neighbor map on the CPU [2004]. The authors identify the neighbor map generation and its transfer to the GPU as the main bottleneck of their system. To overcome this bottleneck, stencil routing has been used to implement spatial data structures on the GPU, particularly for particle systems and particle-based rigid body simulations [Harada 2007; Harada et al. 2007, Smoothed particle hydrodynamics on gpus. 63-70 (hereinafter “Harada et al. 2007b”)]. However, stencil routing among other issues requires the input data to be in sorted order.
Subsequent work [Harada et al. 2007, Sliced data structure for particle-based simulations on gpus. In GRAPHITE '07: Proceedings of the 5th International Conference on Computer Graphics and Interactive Techniques in Australia and Southeast Asia, ACM, New York, N.Y., USA, 55-62 (hereinafter “Harada et al. 2007a”)] describes a sliced spatial data structure for point data on the GPU. This method employs a pre-pass over the point data to construct mapping functions that attempt to minimize wasted memory associated with unused cells in a uniform grid. A final stencil-routing step scatters the particles into cells within the grid.
None of the above items are efficient for various reasons which will be further discussed herein. Therefore, what is needed are methods and apparatuses for sorting point data into spatial bins using graphics hardware.