1. Field of the Invention
The present invention relates generally to vector processing, and in particular to compressing selected elements of vectors using single instruction multiple data (SIMD) operations.
2. Description of the Related Art
A commonly performed operation in computing is selecting a subset of elements from a vector based on some criteria and then storing the selected subset in an output vector. An example of this type of operation is illustrated in code 100 of FIG. 1A. Code 100 in FIG. 1A may be executed to perform such a compression operation of selected vector elements, and code 100 is an example of scalar code used to perform the operation. Scalar code (i.e., serial code) may be defined as code that operates on one element of a vector at a time, whereas vector code may be defined as code that operates on all elements of a vector during each instruction.
The comparison operation of an input vector and a criteria vector can be performed as a vector operation. Typically, the result of the vector comparison is a number of stores of individual elements. The number of individual elements may vary from scenario to scenario, and so the individual elements are normally processed using scalar code. However, executing scalar code in a SIMD loop negates much of the benefit gained from using SIMD operations.
Example vectors that may be used in conjunction with code 100 are shown in FIG. 1B. Input vector 110 contains eight elements (labeled 0-7). In other embodiments, input vector 110 may contain various numbers of elements (e.g., 4, 16, 32). In the example shown in FIG. 1B, an element in input vector 110 may meet the criteria if the element is greater than 0x60. Therefore, any element in input vector 110 greater than 0x60 may be selected and compressed to the leftmost locations in output vector 120. The three elements in input vector 110 that are greater than 0x60 reside in the elements labeled 2, 4, and 7. These elements may be stored in the first three elements (0-2) of output vector 120. A limitation with the prior art code of FIG. 1A and the corresponding example vectors illustrated in FIG. 1B is that storing the three selected elements in output vector 120 requires three separate scalar operations.
Also, prior art loop operations for compressing selected elements typically have unpredictable branches in the middle of the loop since the number of selected elements is not known in advance. Unpredictable branches make it difficult for the processor to run at full speed since the processor does not know whether or not a particular branch will be taken. As a result, the processor may be delayed in fetching instructions, leading to bubbles in program execution. Also, it may be more complex for the compiler to generate code based on the unpredictable branches.
Therefore, there is a need in the art for compress select operations that can be executed with SIMD instructions. In view of the above, improved methods and mechanisms of operations for compressing selected vector elements are desired.