1. Field of the Invention
The present invention generally relates to compilers for parallel processing units (PPUs), and, more specifically, to a technique for live analysis-based rematerialization to reduce register pressures and enhance parallelism.
2. Description of the Related Art
Graphics processing units (GPUs) have evolved over time to support a wide range of operations beyond graphics-oriented operations. In fact, a modern GPU may be capable of executing arbitrary program instructions. Such a GPU typically includes a compiler that compiles program instructions for execution on one or more processing cores included within the GPU. Each such core may execute one or more different execution threads in parallel with other processing cores executing execution threads.
When a processing core within the GPU executes a set of program instructions, the processing core may store program variables associated with those instructions in register memory. When register memory is entirely consumed by program variables, additional program variables may “spill” into system memory, as is known in the art. One problem with the conventional approach to “spilling” is that system memory has a much higher latency than register memory. Consequently, the speed with which the program instructions execute may decrease dramatically after a “spill” event occurs because the program variables have to be accessed from system memory instead of register memory. A second problem is that the number of threads a given processing core is capable of executing simultaneously within a processing unit depends on the available register memory. Thus, filling up register memory with program variables may end up decreasing the number of simultaneously executing threads and, consequently, overall processing throughput of the GPU.
Accordingly, what is needed in the art is a more effective technique for managing register memory within a GPU.