1. Field of the Invention
The present application relates generally to an improved data processing apparatus and method and more specifically to mechanisms for providing optimized scalar promotion with load and splat single instruction multiple data (SIMD) instructions.
2. Background of the Invention
Processor architectures initially were based on scalar operations in which a processor operates on a single value in a scalar register per processor cycle. Such scalar processors represent the simplest class of computer processors. In an effort to increase the speed of computations, vector processors were developed in which a single instruction operates simultaneously on multiple data items. Vector processors are also referred to as single instruction multiple data (SIMD) processors. SIMD exploits data level parallelism by allowing a single instruction to apply the same operation to multiple data elements in parallel. SIMD units employ vector registers which store multiple data elements.
The first era of SIMD machines was characterized by supercomputers like the Cray X-MP. These machines operated on long vectors, for example adding two vectors of 100 numbers each. Supercomputing moved away from the SIMD approach when multiple instruction multiple data (MIMD) approaches became more powerful, and interest in SIMD waned. Later, personal computers became common, and became powerful enough to support real-time gaming. This created a mass demand for a particular type of computing power, and microprocessor vendors turned to SIMD to meet the demand. The first widely-deployed SIMD for gaming was Intel's MMX extensions to the x86 architecture. IBM and Motorola then added AltiVec to the POWER architecture, and there have been several extensions to the SIMD instruction sets for both architectures. These developments have been oriented toward support for real-time graphics, and are therefore oriented toward vectors of two, three, or four dimensions.
While vector or SIMD processing has become prevalent in modern computing devices, programmers still find it easier to use traditional scalar programming techniques when generating computer program source code. Traditional programming allows a programmer to program using scalar instructions with the compiler performing auto-vectorization for optimizing the instructions for implementation on vector processors using SIMD engines. Alternatively, programmers may natively program instructions for vector execution using SIMD engines. However, in such cases, scalar operations tend to still exist in the vectorized or SIMDized code with additional instructions inserted to handle the transition from scalar operation to vector or SIMD operation. These additional instructions represent a significant source of overhead, with regard to consumed processor cycles, required to execute the vectorized code.