The disclosure relates generally to methods and apparatus to migrate a software thread from one execution unit (e.g. single instruction multiple data (SIMD) lane) to another execution unit within, for example, a compute unit (e.g. SIMD unit). Processing devices, such as Graphic Processing Units (GPUs), include one or more compute units that may each be composed of one or more execution units. Traditionally, software threads executing on a GPU are associated with a particular execution unit such that the software thread executes on that particular execution unit for its lifetime (e.g. until the software thread completes executing). A collection of software threads that execute on the same execution unit in lockstep are known as a wavefront, whereby one or more wavefronts may be associated with an execution unit. However, various issues may present themselves when wavefronts remain executing on the same execution unit for the lives of the corresponding software threads.
For example, wavefronts encounter the issue of branch divergence when software threads associated with the same wavefront must execute different paths of a software branch (e.g. a conditional branch instruction or an if/else condition). Because the software threads associated with a same wavefront execute in lockstep on the same execution unit, the execution unit must execute both paths of the software branch as required by the various software threads associated with the wavefront. As such, when the execution unit executes one path, the software threads requiring the alternate path to be executed are held idle (e.g. need to wait until the first path is executed). Likewise, when the execution unit executes the alternate path, the software threads that required the first path are held idle. Not until both paths are processed does the execution unit begin re-executing all of the software threads associated with the same wavefront. In such a situation, the overall processing power of the processing device is not fully utilized, as software threads are sitting idle rather than being executed.
A similar issue arises when wavefronts encounter the issue of memory latency divergence. Memory latency divergence arises when software threads associated with the same wavefront execute memory operations that may take longer than others. For example, while there may be one or more software threads associated with a wavefront that take less time to execute memory operations (e.g. reads or writes to cache memory), there may be one or more other software threads associated with the same wavefront that may take more time to execute memory operations (e.g. reads or writes to main memory). In this situation, because the software threads associated with the same wavefront execute in lockstep, the software threads that execute memory operations that take less time to execute must wait until the software threads that execute memory operations that take more time to execute to complete before all software threads may start again to execute together in lockstep. As with the issue of branch diversion, memory latency divergence causes inefficiencies in the execution of software threads, thus reducing the overall processing power of a processing device.
To address the aforementioned problems caused by branch divergence and memory latency divergence (as well as other issues as recognized by persons skilled in the art), current proposals provide for dynamic wavefront creation, allowing a software thread to migrate from one execution unit to another. In this fashion, software threads following similar software paths, or accessing similar memory such that memory access times are similar, are grouped together to create a new wavefront that will execute on a particular execution unit. In so doing, register context information (e.g. register data residing in a register file) associated with migrating threads must be migrated along with the software thread to maintain the integrity of the software thread's associated register data. However, these current solutions are limited in their ability to solve the aforementioned problems. For example, with lane aware dynamic wavefront creation, a software thread may migrate only to a SIMD lane that accesses a same register column in the register file. As such, this restriction limits the efficiency of dynamic wavefront creation in mitigating branch diversion. Other solutions include changing a register file structure when migrating a software thread from one execution unit to another. However, these solutions are less optimal for larger register file sizes, as they include changing the register file structure. Therefore, a need exists to improve dynamic wavefront creation solutions that allow for the migration of software threads.