1. Field of the Invention
The present application relates generally to source programs. More specifically, the present application relates to the compilation of source programs to a machine language representation and, more particularly, to compiling programs for a SIMD RISC processor.
2. Description of the Related Art
Contemporary high-performance processor designs provide data-parallel execution engines to increase the amount of performance available to application programs by using single-instruction multiple-data (SIMD) parallelism. These instructions encompass a variety of instruction set extensions, such as the IBM Power Architecture™ Vector Media extensions (VMX). FIG. 1 depicts the exemplary operation of a SIMD instruction on a 4-element vector.
While SIMD extensions for conventional microprocessors have exploited the significant data parallelism found in many programs, the related cost has resulted in increased design complexity. Referring now to FIG. 2, a state-of-the-art industry standard microprocessor implementing the Power Architecture™ is depicted, which consists of a number of execution units, such as two load/store units, two fixed point units, one condition execution unit, one branch execution unit, one vector permute unit, one vector simple fixed point unit, one vector complex fixed point unit, and a vector single precision floating point unit. The design also contains a fixed point register file, a floating point register file, a condition register file, a branch execution (Link/Count) register file, and a vector register file.
While the architecture, as the one demonstrated in FIG. 2, allows a high performance reach, resource duplication, such as separate vector and scalar execution units and register files, has to be maintained. Thus, while the architectures provided today can provide high performance, the resource requirements are excessive, resulting in increased chip area, cost, and power dissipation, as well as increased design, verification effort, and complexity. In another undesirable aspect of the shown architecture, sharing of operands between vector and scalar computation units is difficult, as it involves a move across register files, involving significant overhead cost.
In prior art, the Intel Streaming SIMD Extensions (SSE) architecture can share execution of scalar and data-parallel computations using the SSE and SSE2 instruction set extensions. Furthermore, the prior art requires special hardware support to provide both scalar and data-parallel execution, such as special scalar compute and data access operations. These scalar operations are specified to perform partial writes into registers. Disadvantageously, the architectural specification and its implementations are directed at sharing a single (scalar) execution unit for both scalar and data-parallel computation. Finally, as represented, for example, by the partial write specification of the scalar operations, the specification makes efficient implementation with data-parallel paths unnecessarily complex and expensive.