1. Field of the Invention
The present invention generally relates to parallel processing and more specifically to an architecture for executing irregularly parallel applications.
2. Description of the Related Art
There is an ever-increasing class of applications enabled by the significant increase in computing density provided by modern accelerator architectures. Such architectures include, for example, graphic processing units (GPUs), physics accelerators, attached co-processors, and other similar chips.
These applications span many domains, including graphics and physics for gaming and interactive simulation, medical imaging, data analysis (e.g., for oil and gas exploration), scientific computing, three-dimensional (3D) modeling for computer-aided design (CAD), signal processing, image and video compression, analysis, indexing, digital content creation, financial analytics, and the like. The rapid increase in compute density of accelerator chips, the rate of maturity of the programming models and environments, and the general realization of opportunity has caused this space of applications to experience rapid growth.
Candidate applications that benefit from these accelerator architectures are often visually-oriented and interactive. The prototypical application of this class is the raster-based rendering performed by modern GPUs.
The technological value of a typical accelerator architecture is typically based on the amount of data that the architecture is capable of processing per unit of time. The technological value of the accelerator architecture is increased when an increase in work (e.g., pixels, polygons, objects, frames, etc.) done per unit of time increases with each successive generation of the architecture. This property of accelerator architectures may be referred to as “data scale.” For example, in the field of raster graphics, each generation of a GPU provides more processing power (more pixels per second), which enables game developers to deliver games with higher definition graphics, more complex geometry, and more stunning visual effects. These applications executed by the GPU may be executed in a highly parallel manner, and the rate of execution is increased based on architectural approaches that provide high performance through parallelism. When designing a chip for such application domains, performance is the paramount design parameter. The performance increase at each subsequent generation is, however, constrained by other factors including power budgets, schedule, robustness, and cost, die area, layout issues, and the like.
Another class of applications that benefits from parallelism is physics processing. Contrary to data parallel applications like raster graphics rendering, physics processing is parallel in an “irregular” manner. More specifically, when physics processing kernels, such as collision detection and constraint discovery and solving, are decoupled into threads, the threads require some degree of inter-communication. This intercommunication dictates how efficiently physics operations are performed on any given accelerator architecture. An architecture that provides inadequate supports for inter-thread communication and synchronization may perform poorly when executing certain types of physics processing workloads.
This class of applications that exhibit irregular parallelism may be referred to as “irregularly parallel applications.” Such applications exhibit properties more suitable for thread-parallel machines, such as multi-core CPUs, rather than data parallel machines, such as GPUs. However, the scale of compute power required to execute irregularly parallel applications is similar to that required to execute data parallel applications.
Accordingly, there remains a need in the art for a processing system capable of efficiently supporting irregularly parallel applications.