1. Field of the Invention
The invention relates generally to field of computer architecture. More specifically, the invention relates to bussing in pipelined computer architectures.
2. Description of Related Art
In pipelined processors, an instruction is broken up into two or more execution stages, called pipestages. Once an instruction is decoded, the instruction passes through a register file which contains the architectural state in a series of registers. The register file is a block of registers which contain the valid values of instructions and data for processor execution. In pipelined processors, data can be xe2x80x9cretiredxe2x80x9d or loaded into the register file when the execution stages have successfully completed the execution stage generating the data. Data is typically retired when it has undergone xe2x80x9cde-speculationxe2x80x9d where the data is validated as accurate by the pipeline control logic. In a pipelined processor, each execution pipestage is implemented in an execution unit, with the first of such execution units receiving instructions and data from the register file. To return data results after execution in a traditional pipelined processor, each execution unit has a separate return bus coupled to the register file or some cases execution units can share an arbitrated return bus. In the traditional pipelined processor, de-speculation is accomplished by each execution unit so that data can be directly retired to the register file.
In a xe2x80x9cspeculativexe2x80x9d pipelined processor, a single de-speculation point is used at the end of all execution stages with a deterministic number of execution cycles (e.g., the ALU) so that only one (or few parallel pipelines . . . one for each) return bus is required to retire data to architectural state for a given execution pipeline. (Note that non-deterministic execution pipelines such as the load/storexe2x80x94bus control must have a separate return path.) In both traditional pipelined processors and speculative pipelined processors, xe2x80x9cbypassingxe2x80x9d or result forwarding can been employed to avoid pipeline stalls by bypassing the result of an instruction A which is required for a following instruction the execution unit handling B directly to the execution unit handling B before the result of A is retired to the register file. In this manner, the register file is bypassed and the result forwarded along the pipe. Bypassing has traditionally been implemented using a xe2x80x9cbypass busxe2x80x9d from each execution unit (also referred to as a xe2x80x9cbypass sourcexe2x80x9d) back to the execution pipestages. Thus, for a speculative pipeline of N pipestages, the cost savings of only having a single return bus (as opposed to N return buses) is offset by having N bypass buses.
FIG. 1 shows such a prior art speculative pipelined processor with one return bus 105 (which is shared) and multiple bypass buses 115, 125 and 135. A register file 100 is coupled to several execution units, EX1110, EX2120 and EX3130. A single return bus 105 couples EX3130 and register file 100 such that de-speculated results from EX3130 can be retired to architectural state in register file 100. EX3130 acts as a single despeculation point where result data is authenticated (e.g., all events, faults, exceptions, etc., have been handled/resolved). To implement bypassing, each of the execution units is connected to its own bypass bus. Thus, EX1110 has a bypass bus 115, EX2120 has a bypass bus 125 and EX3130 has a bypass bus 135. If the processor is 64-bit, then each bypass bus would be at least 64 bits wide (plus 6 bits for the register address) for a total in extra connection width of at least 210 bits. A data cache unit (DCU) 150 is also shown that shares return bus 105 with EEX3.
Further, to implement bypassing, each pipestage must have an address comparator to compare the destination address of each bypass source with the input operand address to the pipestage. If the addresses match, then a multiplexer selects that bypassed result as the input to the pipestage. FIG. 1 shows a comparator 117 for bypass bus 115, a comparator 127 for bypass bus 125 and a comparator 137 for bypass bus 135. When an instruction and its operand are decoded and pass initially through the register file, the pipelining process begins. When the instruction is executed down the pipe via the various execution units (pipestages), intermediate results (which have not been de-speculated and therefore are not yet retired to architectural state) are sent back over the corresponding bypass bus. The result (address) is compared with the input operand address required for the next instruction. If there is a match between the address for the input operand of a subsequent instruction and the bypass bus result address, a multiplexer passes the bypassed result onto EX1110 for pipelining.
If there are Y total input operands to a particular execution unit and N bypass sources, then a set of Y*N address comparators (one comparator for each bypass bus at each input operand) as well as Y multiplexers with N+1 inputs (N bypass sources+1 from the RF itself) are needed. Thus, at each pipestage, Y multiplexers with N+1 inputs, Y*N comparators and associated wiring are needed to implement bypassing. When a computer system is composed of M such pipelines in parallel, the number of bypass buses, multiplexers and comparators increases in a geometric fashion by at least M. In a 32-bit data bus and 6-bit address bus system, the total number of wires required to implement bypassing would be 38*N*M. The cycle based performance advantage of bypassing the register file comes at a heavy cost in terms of extra hardware (and associated drawbacks such as area, power consumption, etc.) which should be eliminated.
Thus, there is a need to decrease the complexity and cost of speculative pipelining which has the feature of bypassing by providing a single bypass bus that will handle an arbitrary number of bypass sources. Further, there is a need to distribute the comparators and multiplexers within each execution unit to reduce the delay associated in multiplexing outside the execution and reduce the wiring required outside the execution unit. The savings achieved is even greater in a speculative pipeline with many parallel pipelines linked together.
What is disclosed is a method for bypassing result data from bypass sources in a pipelined processor. First, a bypass address is broadcast on a bypass bus when an instruction is decoded to all bypass sources. Each bypass source compares the broadcast bypass address with a destination address of result data to be generated. If the destination address and the bypass address match, the result data is driven onto the bypass bus.