I. Field of the Disclosure
The technology of the disclosure relates generally to out-of-order processors (OoPs), and more particularly to instruction processing systems in OoPs for processing and pipeline instructions.
II. Background
Many modem processors are out-of-order processors (OoPs). OoPs are processors that are capable of dataflow execution of program instructions (referred to as “instructions”). Using a dataflow execution approach, the execution order of instructions in an OOP may be determined by the availability of input data to be consumed by the instructions (“dataflow order”) rather than the program order of the instructions. Thus, the OoP may execute an instruction as soon as all input data to be consumed by the instruction has been produced. While dataflow order processing of instructions may cause the specific order in which instructions are executed to be unpredictable, dataflow order execution in an OoP may realize performance gains. For example, instead of having to “stall” (i.e., intentionally introduce a processing delay) while input data to be consumed is retrieved for an older instruction, the OoP may proceed with executing a more recently fetched instruction that is able to execute immediately. In this manner, processor clock cycles that would otherwise be unused for instruction processing and execution may be productively utilized by the OoP.
An OoP may include both in-order and out-of-order pipeline stages. In-order stages in an OoP conventionally include instruction fetching from an instruction cache or memory into one or more instruction pipelines for speculative prediction (e.g., branch prediction), decoding, and obtaining data for source register operands in instructions. Out-of-order pipeline stages in an OoP conventionally include instruction execution and write back of produced data from executed instructions to be consumed by other pipeline instructions. An OoP also includes a register map table (RMT) and physical register file (PRF) structures. When sourcing data for source register operands of instructions, an instruction processing system may access a RMT to identify the physical register corresponding to the logical register of the source register operand. The RMT is provided to map logical registers to physical registers in a PRF, because there are conventionally more physical registers provided in the PRF than a number of logical registers made available to the instructions according to the architecture of the OoP. Providing a PRF allows the OoP to process instructions out-of-order past slower executed instructions that are delayed, such as waiting for data to be read in from system memory. In this regard, later fetched, but earlier executed instructions having the same register source operands as earlier fetched, but later executed instructions, can be assigned a unique physical register in the PRF so as to not overwrite the physical register of the earlier fetched instruction.
Thus, an important design choice in OoPs is the size of the PRF. If it is desired for the OoP to have a visibility to a large number of future instructions (i.e., an instruction window) in order to extract a larger number of instructions that can be executed independently and out-of-order for increased performance, the PRF should be designed to be larger to accommodate assignment of unique physical registers for source operands. However, larger PRF size increases PRF access time and thus cycle time, which decreases performance. A larger PRF size also adds area and associated cost, and increases power consumption. Also, the wider the instruction stages in the instruction processing systems provided to read source data for instructions from physical registers from the PRF in the same processor clock cycle for increased performance, the greater the number of read ports needed in the PRF. A larger window size without sufficient pipeline width may reduce the possible increase in performance in an OoP. Also, the wider the writeback pipeline stage for increased performance, the more write ports that are needed into the PRF to be able to write back the produced values from executed instructions to the physical registers in the PRF. Larger PRFs may also be required to hold the architectural and speculative register states for supporting multi-threading that further exacerbate issues with providing a larger PRF.