Modern computer processor architectures typically rely on multiple functional units to execute instructions from a computer program. An instruction or issue unit typically retrieves instructions and dispatches, or issues, the instructions to one or more execution units to handle the instructions. A typical computer processor may include, for example, a load/store unit that handles retrieval and storage of data from and to a memory, and a fixed point execution unit, or arithmetic logic unit (ALU), to handle logical and arithmetic operations.
Conventional computer processor architectures typically included scalar execution units that execute scalar instructions on scalar values stored in a register file accessible by the execution unit. Due to hardware advances that now permit multiple and more complex execution units to reside in the same processor, however, computer processor architectures increasingly rely on vector execution units to perform some operations. Vector execution units, which are also sometimes referred to as single instruction multiple data (SIMD) execution units, operate on vector values comprising multiple scalar values using multiple processing lanes to effectively perform parallel operations on the multiple scalar values. In addition, some vector execution units are optimized to handle floating point operations, thereby enabling multiple floating point operations to be performed in parallel. The vector data is typically stored in a vector register file accessible by the vector execution unit. Since vector execution units typically perform more operations per clock cycle, for certain types of operations, executing vector instructions is preferable to executing scalar instructions. In many graphical processing applications, for example, numerous mathematical operations need to be performed on coordinates in a three-dimensional space, and by storing x, y and z coordinates in a vector, such mathematical operations can be performed on all of the coordinate values in parallel using a vector execution unit, rather than requiring three separate scalar operations for the three coordinate values.
However, legacy software, for many iterations of computer hardware architecture may not be written to utilize vector execution units and vector instructions, hence leaving many processes to execute scalar instructions in what is generally a less efficient manner. One solution to this issue involves rewriting and recompiling software written for computer hardware architectures not utilizing vector instructions to generate software that includes vector instructions. However, rewriting and recompiling such software can be troublesome and expensive. A second solution to the issue executes scalar instructions using only one processing lane of a vector execution unit, thereby effectively utilizing a vector execution unit as a scalar execution unit. However, this approach generally reduces the efficiency associated with using a vector execution unit, since the other processing lanes are essentially dormant during these operations.
Therefore, a continuing need exists in the art for a manner of increasing the efficiency of instruction execution in computing systems including vector execution units.