1. Technical Field
The present application relates generally to an improved data processing apparatus and method. More specifically, the present application is directed to an apparatus and method for optimizing scalar code executed on a single instruction multiple data (SIMD) engine by aligning SIMD slots of SIMD registers.
2. Description of Related Art
On an autonomous single instruction multiple data (SIMD) engine with no scalar instructions, all scalar code must be executed in SIMD registers. Since scalar data may be placed in storage on different alignment boundaries, in general the operands for a scalar operation may not be placed in congruent slots by the load instructions of the SIMD processor. That is, one scalar operand may be in slot 3 while another scalar operand may be in slot 2 of the SIMD registers. This causes a problem with scalar operations being performed in SIMD registers because congruent slots in SIMD registers are combined to perform an operation. If the operands are not in congruent slots, the operands of the scalar operation will not be properly combined.
To illustrate this problem, examples of SIMD registers are shown in FIGS. 1A-1C. As shown in FIG. 1A, a first SIMD register 110 stores values x0-x3 in slots 0-3, respectively. A second SIMD register 120 store values y0-y3 in slots 0-3, respectively. The values in SIMD registers 110 and 120 are combined and stored in SIMD register 130. In the depicted example, the y0-y3 values are subtracted from the x0-x3 values to generate the resultant values (x0-y0) to (x3-y3) in SIMD register 130.
As can be seen from FIG. 1A, with SIMD registers, operand values in congruent slots in the SIMD registers are combined to generate results that are stored in a congruent slot in a resultant SIMD register. Since scalar operands are comprised of a single value, and not multiple values as with vector operands, when a scalar operand is loaded into a SIMD register, the scalar operand is loaded along with other values that are not used in the scalar operation. Due to alignment boundary differences or other factors, the scalar operand may be present in any one of the multiple slots of the SIMD register. Thus, if two scalar values are to be subtracted, a first scalar value may be stored in slot 1 of a first SIMD register 110 and a second scalar value may be stored in slot 2 of a second SIMD register 120. Such a situation is illustrated in FIG. 1B.
As shown in FIG. 1B, a scalar operation that is to be performed, in this simple example, is the subtraction of the scalar operand value “1” from the scalar operand value “7.” However, because the scalar operands are misaligned, i.e. scalar operand value “7” is in slot 1 of SIMD register 110 and scalar operand value “1” is in slot 2 of SIMD register 120, this scalar operation cannot be performed with the current alignment of values in the SIMD registers 110 and 120.
The simple solution to this problem is to always shift scalars to a preferred slot before execution of a computational operation on them and, if required, to shift the result back to the appropriate slot for storage. Shifting of the slots may be achieved by use of a rotation, which shifts the desired slot into the appropriate position, but preserves other data in the register. Shifting may also be achieved by a shuffle operation, which can put the single slot in all positions, essentially a multiple shift, but there may be a small amount of additional overhead in this case.
A rotation solution is shown in FIG. 1C. In the depicted example, the preferred slot is slot 0 and all scalar operand values are shifted (or rotated, as in this example) to slot 0 prior to performing a computational operation on them. For example, the scalar operand value “7” is shifted from slot 1 to slot 0 of SIMD register 110 and the scalar operand value “1” is shifted from slot 2 to slot 0 in SIMD register 120. As a result, when the computational operation, e.g., subtraction, is performed on the slots of the SIMD registers 110 and 120, the proper result “6” is generated and stored in resultant SIMD register 130. An additional shift operation may be performed within SIMD register 130 to move the result to a different slot within SIMD register 130 if required.
While this solution ensures that scalar operations are properly performed in SIMD registers, the solution requires extra processing cycles to perform shift operations. Such shift operations may not be necessary if the scalar operands are properly aligned but just are not in the preferred slot. For example, if both operands are in slot 2 of the SIMD registers 110 and 120, then the solution would still require that they be shifted to slot 0 before the computational operation may be performed. There is no ability to determine whether scalar operands are properly aligned in SIMD registers prior to shifting to a preferred slot in the known solution. All scalar operands must be shifted to the preferred slot in the known solution.