Technical Field
This disclosure relates to computing systems, and more particularly, to efficiently processing instructions in hardware parallel execution lanes within a processor.
Background
The parallelization of tasks is used to increase the throughput of computer systems. To this end, compilers may extract parallelized tasks from program code to execute in parallel on the system hardware. To increase parallel execution on the hardware, a processor may include multiple parallel execution lanes, such as in a single instruction multiple word (SIMD) micro-architecture. This type of micro-architecture may provide higher instruction throughput for particular software applications than a single-lane micro-architecture or a general-purpose micro-architecture. Some examples of tasks that benefit from a SIMD micro-architecture include video graphics rendering, cryptography, and garbage collection.
In many cases, particular software applications have data parallelism in which the execution of each work item, or parallel function call, is data dependent within itself. For example, a first work item may be data independent from a second work item, and each of the first and the second work items are simultaneously scheduled on separate parallel execution lanes within a SIMD micro-architecture. However, an amount of instructions executed within each of the first and the second work items may be data-dependent. A conditional test implemented as a branch instruction may pass for the first work item, but fail for the second work item dependent on the data for each work item.
The efficiency of parallel execution may be reduced as the second work item halts execution and waits as the first work item continues with its ongoing execution. The inefficiency grows when only a few work items continue execution due to passed tests whereas most of the work items are idle due to failed tests.