Many microprocessors operate with Vector architectures and include a Vector Processing Unit (VPU). Vector architectures enable simultaneous processing of many data items in parallel. Operations may be performed on multiple data elements by a single instruction—referred to as Single Instruction Multiple Data (SIMD) parallel processing.
Many implementations of a VPU may use dedicated register files that are disjoint from a General Purpose Register (GPR) file. There is accordingly a need to transfer data from the GPR to a Vector Register (VR).
Prior art solutions for transferring data from the GPR to the VR may be classified into three main approaches. The first approach stores data from a GPR to memory and then loads the data from the memory into a VR. An example of this approach is embodied in AltiVec. AltiVec (trademark of Motorola, Inc.) is a high bandwidth, parallel operation vector execution unit developed as a SIMD extension to the PowerPC ISA (instruction set architecture). AltiVec is a vector architecture that can process multiple data streams/blocks in a single cycle. However, transferring data indirectly through memory has disadvantages. It is time consuming and can cause pipeline stalls.
A second approach provides explicit instructions to transfer data to/from the register files. Intel's MMX/SSE/SSE2/SSE3 technologies employ this solution. However, this has the disadvantage of adding additional instructions to the architecture. While the additional instructions may be acceptable for a CISC (Complete Instruction Set Computer), they are undesirably limiting for a RISC (Reduced Instruction Set Computer).
A third approach has the vector and scalar registers share the same file. In this manner the vector and scalar instructions access the same physical register, eliminating the need to transfer data between them. This was the original implementation of Intel's MMX technology. However, it has the disadvantage of reducing the number of registers available to the processor.