Modern processors often include instructions to provide operations that are computationally intensive, but offer a high level of data parallelism that can be exploited through an efficient implementation using various data storage devices, such as for example, single instruction multiple data (SIMD) vector registers.
Some vector processors in the past have used two types of special control registers, a mask register to selectively disable or mask operations for particular vector elements in the vector registers, and a vector length register to indicate the number of vector elements that are stored in a vector register. Instructions have also been provided to set the mask register from the results of a vector comparison. Since these limited method of masking operations typically employed the execution of a pipelined vector comparison, potential performance advantages of using the mask register may not have been fully realized. In addition, some implementations utilized a pipelined testing of the mask register to selectively disable or mask the operations for particular vector elements, and in some implementations only the writing of the results of the masked operations were disabled and the masked operations were still performed in the pipeline, thereby negating potential performance advantages of using the mask register. Such implementations may limit performance advantages otherwise expected for example, from a wide, or large width vector architecture.
To date, potential solutions to such performance limiting issues and bottlenecks have not been adequately explored.