This invention relates to efficiently executing instructions at a processor.
With some types of data, such as graphics data, large blocks of data often need to undergo the same processing operations. One example is when changing the brightness of an image. Processing such blocks of data in parallel can reduce the processing time compared with serial processing. Parallel processing can be carried out on a single instruction multiple thread (SIMT) or single instruction multiple data (SIMD) processor, which are microprocessors with execution units, caches and memories as with any other processor, but additionally incorporates the concept of parallel execution of multiple threads or data streams. Each thread executes the same set of instructions but on different data which, instead of having each thread individually fetch data from memory, can be provided to the threads by a single fetch operation to fetch a block of data for each of the threads. SIMT and SIMD processing can provide improved processing efficiency as compared with traditional single instruction single data (SISD) processing.
SIMT and SIMD processors comprise a plurality of processing elements that can concurrently execute the same instructions. Each processing element supports its own thread and each thread runs the same program code, but with different data. One problem with SIMT and SIMD processing is the high cost of a branch operation (as might be caused by a conditional statement in the program code) which results in some data in a block being operated on by one branch of instructions and the remaining data by another branch of instructions the identity of which is not known until the condition has been met. Such an operation can cause idling and underutilisation of processing elements as well as an increase in the processing time for the program. There is therefore a need for more efficient parallel processing of programs that have branching operations.