This relates generally to processors and, particularly, to single instruction multiple data processors.
A single instruction multiple data (SIMD) processor is a processor in which each instruction can operate on multiple data elements in parallel. Some single instruction multiple data processors can operate in different lengths, such as SIMD8, SIMD16, or SIMD32.
A physical SIMD register has a large number of bits which may be used to store multiple smaller data elements. The mode of operation may be loosely described as SIMDm×n, where “m” is a numerical term describing the size of the vector and “n” is the number of concurrent program flows executed in SIMD. SIMD8, short for SIMD1×8, stands for the SIMD operation based on the structure of arrays data structure where one register contains one data element (the same one) of eight vectors. Effectively, there are eight concurrent program flows. SIMD16 is short for SIMD1×16, where each SIMD instruction operates on a pair of registers that contain one data element (the same one) of 16 vectors. SIMD16 has 16 concurrent program flows.
A write mask may be used to allow part of a register to be computed through one control flow branch and another part of the register to be computed through another control flow branch. Execution errors may occur when a register, written with one mask in one control flow branch, is written with a different mask in a parallel control flow branch.
For example, a simple execution error uses a “no mask” modifier in the “parallel” branch of an instruction that block loads constant data into a register that was also written in the “then” branch. An instruction with the no mask modifier may overwrite data that was written in the “then” branch. If the no mask modifier were omitted or if a different register were used as the destination for the block load, no meaningful data would be overwritten. In this case, however, the no mask modifier is required and most register allocation algorithms will allow the same destination to be used for each instruction.
Typically, this means that the mask modifiers or write masks may not be used unless all “parallel” branches use the same type of masks, e.g. the same width of mask. This inability to use write masks or mask modifiers with different types of write masks may result in reduced efficiency because of the use of a smaller number of concurrent program flows.