1. Field of the Invention
The present invention relates to the field of computer operation, and more specifically to register files in a computer.
2. Description of the Related Art
The original 8086 used eight general purpose registers in a sequential processor. Each instruction was received and was processed one at a time. However, as processors have become pipelined to achieve higher clock rates, dealing with instructions has become more problematic. Pipelined instructions have added complexity in dealing with bypassed operands, and have made handling partial registers more difficult. Pipelined processors have been able to process instructions out of order, "shelving" instructions until all operands needed for the operation are valid.
When an instruction that requires operands is received, the pipelined processor has typically responded by examining the register file to determine whether all of the operands are available. If some operands are not available, the instruction is "shelved" until the operands become available. An instruction may remain shelved for several clock cycles. For example, this could occur when operands missing from an instruction must be read from a memory having a high latency. While an instruction is shelved, however, later instructions received by the processor may attempt to update the values of the operands that were available in the shelved instruction. Consequently, performing operations out of order can be a complicated task, as updates to operand values require complicated register file management.
Similarly, instructions have been allowed to produce intermediate values, which have not been ready to be written to the register file. "Intermediate" values refer to those values generated by an execution unit such an arithmetic logic unit (ALU), but which may not be complete. When an instruction that has generated a result is not yet known to be on the execution path, the instruction is not yet known to be complete. For example, consider an instruction to add the values of AX and BX and place the result in AX. The value of BX may be available, but the original value of AX may not be available. In such a case, the instruction is not complete; an operand is still necessary to complete the operation. The result, therefore, is "intermediate" in that it depends on a yet-to-be-determined value. Such intermediate values may not be written to the register file until the instruction is known to be complete.
Instructions are not known to be complete until all older instructions in the instruction path have completed and have not encountered any exceptional conditions. However, pipelined processors have generally been permitted to execute instructions on intermediate data, and then later to determine that the intermediate data is valid and may be written to the register file. Pipelined processors have therefore achieved much of their speed performance by allowing results to be used before those results are committed to the register file.
When a processor has received updates for some, but not all, of the operands needed for a particular operation, the pipelined processor has required additional logic to prevent use of "old" values for the missing operands. Handling instructions out of order can greatly complicate this task. Moreover, when partial register handling has been supported, updates to a portion of a register have complicated handling the register, where portions of the register have been valid and portions of the register have been invalid.
Intermediate results have therefore been held in a pending state until written to the register file. The pending state has been implemented as an additional "pending" file somewhat structurally similar to, and in some implementations larger than, the register file. Bypassing, or reading intermediate values from the pending state, has allowed pipelined processors to execute instructions before determining that the instructions that had generated those intermediate values are complete.
Typically, the pending state has been implemented as a number of stages along a pipeline. Data has been copied from one stage to the next until, at the final stage, that data has been written to the register file. At each point along the pipeline, data has been available to "younger" instructions, with more recent updates to the register value being written to the first stage of the pipeline. Instructions have been able to select values from the various stages of the pipeline, and from the register file.
To handle the relatively large number of sources (the various stages of the pipeline and the register file) from which an instruction can read values of an operand, a bypass multiplexer ("bypass multiplexer") has typically been included. The implementation of the bypass multiplexer, however, is different in many CISC architectures than it is in RISC architectures, because the former supports partial register write operations, while the latter does not.
The x86 instruction set originally supported eight 16-bit general purpose registers, four of which could be divided into two 8-bit general purpose registers. Division of registers AX, BX, CX, and DX allowed byte-access (8-bit access) to the upper and lower bytes of these registers. As a result, not only have registers AX, BX, CX, and DX been supported, but also registers AH, AL, BH, BL, CH, CL, DH, and DL, referring to the high-order byte and low-order byte within each 16-bit register. The register count effectively increased to 16 registers: registers AX, BX, CX, DX, AH, AL, BH, BL, CH, CL, DH, DL, SP, BP, SI, and DI. Registers SP, BP, SI, and DI have not been divided.
Moreover, with the introduction of the 386, the x86 architecture grew to support 32-bit registers. However, the prevalence of code using the original 16-bit register set instruction set necessitated support of both 16-bit and 32-bit register sizes. The 386 instruction set allowed access to all the aforementioned registers, as well as allowing access to "extended" (32-bit) registers. The 16-bit registers were considered partial registers of the new 32-bit registers. Each instruction was provided with four partial register options: the instruction could select the full "extended" register, or the "lower" 16-bits of the extended register, or the high-order byte or the low-order byte of the lower 16-bit register.
For example, in the 386 instruction set, an instruction was permitted to access an extended register, for example EAX. Such an access would access bits [31:0] of a 32-bit register. Another instruction could access a 16-bit register, for example AX. Such an access would access bits [15:0] of the same 32-bit register. Another instruction could access an 8-bit register, for example, AL. Such an access would access bits [7:0] of the same 32-bit register. Another instruction could access another 8-bit register, for example, AX. Such an access would access bits [15:8] of the same 32-bit register.
In part due to the problem of handling partial registers in a pipelined processor, and in part due to the low cost of registers, RISC microprocessors have not supported partial register write operations. Adding registers has become far less expensive than dividing existing registers, particularly in light of the added complexity. Therefore, the RISC processor historically has operated on registers in their entirety. Consequently, the bypass multiplexer in a pipelined RISC processor configuration has been required only to select one of the sources (the various stages of the pipeline or the register file) from which an instruction can read values of an operand.
On the other hand, CISC architectures allow a variety of portions of the registers to be altered. A pipelined CISC processor configuration that allows portions of registers to be operated upon has typically required an additional field to indicate which portion of the register has been updated between one stage and the next. The same result stage registers have been used for the result, and the additional field has been used to indicate how to write (commit) the result to the register file at the end of the pipeline. Since CISC architecture allows a variety of portions of registers to be altered, the bypass multiplexer must select more than one place that an operand might be found, since portions of that operand may be generated by different instructions.
These approaches have proven unsatisfactory. The implementation of the bypass multiplexer has proven extremely complicated in CISC architectures. A particularly problematic example is the situation in which multiple instructions have written to different portions of a register, and then an instruction requiring the value of the register is encountered. In such a case, various stages of the pipeline contain different portions of the register value to be used. According to one approach, the instruction requiring the value of the register has caused the bypass multiplexer to select from the relatively large number of sources of values. According to another approach, the instruction requiring the value of the register has been stalled until enough of the results have been written that the operand comes from a single result register or from the register file itself. According to still another approach, a combination of complex multiplexing and stalling has been used.