1. Field of the Invention
The invention relates generally to compiler systems and, more specifically, to convergence analysis in multithreaded programs.
2. Description of the Related Art
Certain computer systems include a parallel processing subsystem that may be configured to concurrently execute plural program threads that are instantiated from a common program. Such systems are referred to in the art as having single instruction multiple thread (SIMT) parallelism. An application program written for execution in an SIMT model may include sequential C language programming statements and calls to a specialized application programming interface (API) used for configuring and managing parallel execution of program threads. A function within an SIMT application that is destined for concurrent execution on a parallel processing subsystem is referred to as a “thread program” or “kernel.” An instance of a thread program is referred to as a thread, and a set of concurrently executing threads may be organized as a thread group. Each thread may follow a different execution path based on certain identifying index variables or computational results.
During the course of following different execution paths, one set of threads may execute one branch of a conditional statement, while another set of threads may execute a different branch of the same conditional statement. In such a scenario, the two different sets of threads execute divergent paths that need to converge at some point later during execution. A synchronization barrier may be used as an explicit convergence point and may implicate a certain portion of a thread program as convergent. Other techniques are known in the art for detecting convergence based on certain ad-hoc rules, but a general technique for identifying all convergent basic blocks is not presently known in the art. Each basic block includes one entry point and one exit point in execution flow. A given basic block may be represented as a corresponding node in a control flow graph (CFG).
Certain beneficial optimizations may be applied to convergent basic blocks. In one exemplary optimization, a convergent basic block may have related data allocated to common storage for greater access efficiency. In another exemplary optimization, a convergent basic block may be scheduled to run on a specific thread processor for greater execution efficiency. Identifying each convergent basic block generally represents an opportunity to better optimize a thread program. However, as alluded to above, thread program compilers are conventionally unable to fully detect all convergent basic blocks in a general thread program and are therefore unable to fully optimize certain thread programs undergoing compilation.
As the foregoing illustrates, what is needed in the art is a technique for identifying convergent basic blocks in a thread program.