Many microprocessors are relatively simple in-order machines. In an in-order processor instructions are fetched and if source operands of the instruction are available in a register file of the processor the instruction is issued to the appropriate functional unit. Instruction issue typically refers to sending an instruction to a functional unit, for example an execution unit, for processing. In an in-order processor, instructions are issued and executed in program order. In a pipelined in-order processor the pipeline is stalled until operands of an instruction are available.
In an out-of-order processor, instructions are fetched and dispatched to an instruction dispatch buffer. The instructions wait in the buffer until their operands are ready and are issued before earlier or older instructions, and out of program order. The results are then queued in a buffer, for example in a completion buffer. The completion buffer keeps track of the program order of instructions and after older instructions write their result into the register file, the younger instructions write their results into the register file. In an out-of-order processor, instructions are executed out of program order and their results are written into the register file in program order. Pipelined out-of-order processors allow execution of instructions to be scheduled around hazards that would stall a pipelined in-order processor.
Typically, instructions comprise one or more source operands and a destination operand. The destination operand of an instruction is usually modified based on, at least in part, the source operands. An instruction that modifies a destination operand is typically referred to as a producer of another instruction whose source operand it modifies. The instruction whose source operand is modified by a producer instruction is typically referred to as a consumer. The source operand of the consumer is typically the destination operand of the producer. Producers are processed by an execution unit of a processor before their corresponding consumers are processed. Producer instructions may be consumers of other producers and consumers may be producers of other consumer instructions. A consumer may have more than one producer that it depends upon for source operands. The source operands of a consumer instruction may be bypassed from a producer instruction.
Bypassing refers to the transfer of an operand value modified by a producer instruction to a consumer instruction before the producer instruction writes its results into a register file (i.e. before the producer updates the architectural state). A bypass policy of a processor determines when and from where one or more operand values modified by a producer instruction can be sent to a consumer instruction. An instruction can only be issued to an execution unit of a processor when all source operand values are available (e.g. in a register file or via bypass from a producer instruction). As a result, the bypass policy can determine the earliest time that an instruction can be issued.
Some out-of-order processors use a technique known as scoreboarding to allow instructions to execute out-of-order when there are sufficient computing resources available and no data dependencies for the source operands. A centralized scoreboard is used to check for operand availability of an instruction. A centralized scoreboard stores the status for each register in a processor and every instruction looks up the centralized scoreboard to determine if their operands are available. In an out-of-order processor that uses scoreboarding, every instruction goes through the centralized scoreboard where a record of data dependencies of the source operands of the instruction is created. The centralized scoreboard determines when the instruction can read its operands and begin execution. If the centralized scoreboard decides that an instruction cannot execute immediately due to unavailability of its source operands, it monitors changes in the system state and decides when the operands are ready. If the source operand values are ready to be read, the centralized scoreboard determines when the instruction can be issued. Thus all hazard detection and resolution is centralized in the scoreboard. The centralized scoreboard has to communicate with all functional units of the processor which represents a structural hazard since there are only a limited number of buses on which to communicate.
A centralized scoreboard implementation requires a large area on the chip. Furthermore, looking up a centralized scoreboard can be time consuming. A centralized scoreboard stores the status for each register. An instruction typically needs to access values for one or two operands and looks up the status for one or two registers. When a centralized scoreboard is accessed to determine availability of operands, one or two registers in the scoreboard are selected out of all the registers in the processor. This is equivalent to a time consuming lookup of a register file. Also, complicated routing is required if multiple instructions attempt to lookup a scoreboard at the same time. The size of the scoreboard and the number of buses to the scoreboard can be increased which consumes valuable chip real estate and also has timing implications. The complexity of looking up a centralized scoreboard also delays instruction issue.
What is needed is a new technique for reducing the complexity of a centralized scoreboard in an out-of-order microprocessor, which overcomes the deficiencies noted above.