Recently developed methods for speeding up the processing of information processing apparatuses include the so-called superscalar method, VLIW (very-long-instruction-word) method and super-pipeline method. These methods are capable of executing a plurality of instructions in a single cycle. What follows is a description of the bypass function (using an arithmetic circuit output bypass, or BPS) characteristic of the typical prior art pipeline processing. For the description, reference is made to FIGS. 2 and 3.
As shown in FIGS. 2 and 3, the stages of pipeline processing comprise an instruction fetch stage (Inst Fetch) for fetching instruction cache data (ICACHE) in accordance with a program counter (PC), a decode stage (Decode) for decoding an instruction, an execute stage for moving the contents of a register file (REG) into operand registers (OP) for execution by an arithmetic and logic unit (ALU), and a write stage for writing the result of the execution to the register file (REG). The speed of the processing is enhanced by having the handling of each instruction split into a plurality of stages even as a plurality of instructions are being processed in parallel. As depicted in FIG. 2, the result from the arithmetic and logic unit (ALU) is transferred through an arithmetic circuit output bypass (BPS) to the operand registers for the next instruction. That is, the execution result from the preceding instruction is used unmodified by the current instruction, whereby the processing speed is increased.
In the pipeline processing of FIG. 2, raising the number of instructions executing in parallel requires installing more read ports (PT) from the register file (REG), as shown in FIG. 3. (FIG. 3 shows a four-port setup.) The resultant increases the amount of hardware and prolongs delays in the time it takes a selector (SEL) to perform its processing. Consequently, cost of the processing apparatus tend to be high due to its bloated hardware even as the efficiency and speed of the processing are liable to worsen.
Super scalar microprocessor design (Mike Johnson 1991, Prentice Hall) describes how to reduce the number of register file read ports since not all instructions need register read processing for their operand data (for example immediate operand).
A register designator arbitration means is provided. There is a selector which inputs register designators of all instructions and selects only the register designators which indicate an operand that must be read from a register file. Thereby the number of implemented register ports is reduced.