1. Field of the Invention
The present invention relates generally to processor architecture, and more particularly to techniques for selective bypassing of a multi-port register file.
2. Background of the Invention
To improve performance, contemporary processors often employ pipelining techniques to execute instructions at very high speeds. On such processors, instruction processing is divided into a sequence of operations, and each operation is performed in a corresponding pipeline stage. Independent operations from several instructions may be processed simultaneously by different pipeline stages, increasing the instruction throughput of the processor. A typical instruction pipeline in a microprocessor includes the following pipeline stages: Instruction Fetch (IF), Decode (Dec), Data Read (RD), Execute (EX), and Write (WR).
Referring to FIG. 1, a multi-operation microprocessor organization is illustrated. As depicted, the multi-operation microprocessor includes a two functional unit (FU) architecture with a four read port (R1–R4), two write port (W1, W2) register file (150). It should be appreciated that while a two functional unit architecture is shown, the actual number of FUs for this type of a processor could be more than the number shown. Furthermore, it is to be understood that although there are many designs for multi-operation microprocessors, the ones presented here allow the data elements to be independently accessed from a register file, that is, using independent indexes into the register file.
When the operations performed in the EX stage are specified independently, that is by separate instructions, the microprocessor organization is known as superscalar. In contrast, when the operations are specified by a single instruction that operates on multiple data elements, the microprocessor organization is known as single-instruction multiple-data (SIMD)
During the RD stage (110), four data elements are read simultaneously from the multi-port register file (150) and grouped into two separate sets, with two elements each. Herein, these sets of elements are known as vectors.
During the EX stage (120), two parallel functional units (140, 142) perform an arithmetic or logic operation on the two data vectors. At the WR stage (130), the results generated in the functional units (140, 142) are grouped into a result vector and written back to the register file (150).