Field of the Invention
Embodiments of the present invention relate generally to parallel processing and more specifically to systems and methods for voting among parallel threads.
Description of the Related Art
Typical parallel processing subsystems include at least one parallel processing unit (PPU) that may be configured to provide a high volume of computational throughput that is impractical to achieve with a single processing unit. The PPU may be configured to incorporate a plurality of processing cores, each capable of executing a parallel program on a plurality of processing engines. Each processing engine may be configured to execute an instance of the parallel program. Each executing instance of the parallel program, called a parallel program thread, or simply “thread,” usually computes a portion of the overall results generated by the parallel program.
Parallel program threads that are participating in a parallel computation are often required to communicate with other participating parallel program threads. A particularly useful form of communication is a vote, which is commonly performed on vote data provided by each participating parallel program thread. In one example of communication among parallel program threads, each participating parallel program thread may provide a “yes” or “no” (“1” or “0”) vote regarding a specific piece of state information associated with the parallel program thread. The votes are tallied, and the result is used to guide subsequent computations within the parallel processing subsystem. One common vote operation, referred to as a “vote any,” computes whether at least one of the participating threads voted “yes” in the vote. The result of the vote any operation corresponds to a Boolean “OR” of all participating votes.
A practical application of a “vote any” operation may be found in a parallel search application, where each parallel program thread reports whether a specified search condition is found. If the search condition (i.e., a pattern match) is found by at least one thread, then the threads may be directed to process the finding. Otherwise, the threads may continue to search for a match.
Prior art voting operations in parallel processing subsystems typically include counting or combining a number, N, of votes using N sequential serial tallying steps. Each tallying step further requires a set of serial operations, including a synchronization step, a communication step, and a combination step. The serial nature of a conventional voting operation tends to degrade system performance because each vote operation typically involves multiple tallying steps, where each tallying step requires multiple system cycles. The synchronization, communication, and combination steps are typically very time-consuming in parallel systems because they involve synchronization and communication among several parallel threads and parallel processors.
Certain prior art single-instruction multiple-data (SIMD) systems include combining mechanisms that may improve performance of voting operations. For example, a global OR combination operation may be performed over multiple data channels from multiple processing engines within a SIMD processor. However, the availability of the result of the global OR combination operation is limited to one related SIMD instruction controller, which uses the result to control subsequent operations of the multiple processing engines. The results are usually not available to the processing engines executing instructions initiated by the SIMD instruction controller, because no data path is conventionally available for conveying global combination results to the related processing engines. Because of this limitation, the processing engines within a conventional SIMD system are unable to individually incorporate the results of a vote, thereby limiting the overall generality of a SIMD type of processing regime. Furthermore, in many multi-threaded parallel processing subsystems that may host multiple independently executing SIMD programs, cascading effects resulting from inefficient vote operations may further reduce overall performance.
As the foregoing illustrates, what is needed in the art is a technique for efficiently performing general voting operations within multi-threaded parallel processing systems.