1. Field of the Invention
The present invention generally relates to parallel-processing techniques in computer systems. More specifically, the present invention relates to techniques that facilitate parallel execution of generic reduction operations.
2. Related Art
The latest generation of multiprocessor systems can dramatically increase computational performance by enabling a large number of processors to be dedicated to a single computational task. Exploiting such parallelization, however, requires the ability to divide a task into sub-operations which can be executed in parallel. Loop constructs in programming languages often serve as fertile ground for parallel execution. A common parallel operation is a “reduction,” in which an arithmetic operation such as addition or multiplication is repeated across a set of data elements to reduce the set to a desired result.
A reduction operation is typically an associative operation that can be divided into a group of sub-operations, each of which can be performed in parallel. The results of the sub-operations are then merged to form partial results, which are in turn also merged to form a final result. For the most part, these merge operations can also execute in parallel.
Currently, programming languages provide no mechanism for expressing “generic” parallel reduction operations. Many parallel programming languages, such as languages extended to support parallelism via the OpenMP parallelization specification, support only simple reduction operations for a limited set of operators such as max/min, addition, multiplication, and bitwise AND/OR. However, many other reduction operations are possible. Unfortunately, parallel processing support is not provided for a large set of reduction operations because existing parallel programming languages lack mechanisms for describing them.
Hence, what is needed is a method and an apparatus that facilitates parallel execution of generic reduction operations in a parallel programming language.