1. Field of the Invention
The present invention relates generally to register files on microprocessors, and more particularly to working register files on microprocessors.
2. Description of Related Art
Early computer processors (also called microprocessors) included a single central processing unit (CPU) that executed only one instruction at a time. As is well known, a CPU executes a program, having instructions stored in memory, by fetching instructions of the program, decoding the instructions and executing the instructions one after the other. In response to the need for improved performance, several techniques, e.g., pipelining, superpipelining, superscaling, speculative instruction execution and out-of-order instruction execution, have been implemented to extend the capabilities of early processors.
Pipelined architectures break the execution of instructions into a number of stages, where each stage corresponds to one step in the execution of the instruction. Pipelined designs increase the rate at which instructions can be executed by allowing a new instruction to begin execution before a previous instruction is finished executing. Pipelined architectures have been extended to superpipelined or extended pipeline architectures, where each execution pipeline is broken down into even smaller stages. In general, superpipelining increases the number of instructions that can be executed in a pipeline at any given time.
Superscalar processors generally refer to a class of microprocessor architectures that include multiple pipelines that process instructions in parallel. Superscalar processors typically execute more than one instruction per clock cycle, on average. Superscalar processors allow parallel instruction execution in two or more instruction execution pipelines. In this manner, the number of instructions processed is increased due to parallel execution. Each of the two or more execution pipelines may have a different number of stages. Some of the pipelines may be optimized for specialized functions, such as integer operations or floating point operations, and in some cases execution pipelines are optimized for processing graphic, multimedia, or complex math instructions.
Typically, pipelined processors need to provide access to the registers needed for execution at multiple points in the pipeline. This can be done through separate register files, or through a content addressable memory (CAM) based register file coupled with a random access memory (RAM) based register file, or through a combination of the above and direct connections between pipeline stages, etc.
In at least one architecture, the register file has included a working register file (WRF) and an architectural register file (ARF). In this design, the working register file included working registers of the execution unit, while the architectural register file included architectural registers of the execution unit. Typically, each of the working registers corresponds to one of the architectural registers. The working register file stored operands generated for an associated pipeline, prior to validation of executed instructions.
Various designs have made available operands stored within the working register file for use in executing other instructions in an associated pipeline. The architectural register file has been utilized, in conjunction with an associated working register file, to store generated operands of valid executed instructions. The architectural register file has also provided valid operands for transfer to appropriate registers of an associated working register file, in the event that one or more executed instructions are later determined to be invalid.
In a typical execution unit, each instruction has been pre-decoded to include pre-decode bits, at least some of which have been used to resolve operand dependencies with other instructions in a pipeline. The pre-decode bits provided a basis for the generation of control signals that were used to control the operation of the working register file, the architectural register file and their associated pipeline.
A typical pipeline has a number of successive stages, e.g., an operand selection stage, an operand processing (i.e., execution) stage, a working register file operand write stage, an instruction validity determination stage and an architectural register file operand write stage, among other pipeline stages. In the usual case, each of the pipeline stages occur in one machine cycle and a lifetime of an entry in the working register file has been cycle-based. Furthermore, the working register file has traditionally been read during the operand processing or execution stage. The operand processing stage has included registers, which have latched one or more selected source operands. In a typical case, a destination operand for each instruction in the pipeline is generated by arithmetic logic in the operand processing stage for the instruction. This has been accomplished by processing one or more selected source operands in response to control signals generated by control logic of the pipeline.
The control logic has decoded each instruction in the pipeline to generate control signals for controlling the arithmetic logic. The destination operand for each instruction in the pipeline has then been written to the working register file, during the working register file write stage for the instruction. In doing so, the destination operand is stored in one of the working register file working registers, which has corresponded to the architectural register that is specified by the instruction as the destination.
As a result, the destination operands have been available directly from the working register file, which selectively provides source operands from selected working registers in the working register file to the pipeline during an operand selection stage for each instruction in the pipeline. This occurs if it is determined, during the operand selection stage, that the instruction specifies an architectural register in the architectural register file for which the source operand is available in the corresponding working register of the working register file.
For each instruction in a pipeline, it may be determined that the instruction requires an immediate source operand from the control logic, instead of a source operand from the working register file. In this case, a multiplexer selects the immediate source operand. It may also be determined, for each instruction in the pipeline, that the source operand is not yet available in a working register of the working register file, but is in-flight and available elsewhere (or may not be readily available, causing a stall for instance). In this case, the source operand may be available as a destination operand from a previous instruction. In general, the number of operand bypasses required by a pipeline is drastically reduced when a working register file is implemented in conjunction with an execution unit.
Generally, the validity determination stage for each instruction in the pipeline determined whether the instruction was valid or invalid, as indicated by various status signals. In the architectural register file operand write stage, for each instruction in the pipeline that was determined to be valid, the architectural register in the architectural register file that was specified by the instruction as the destination had stored the destination operand provided by the register.
In this way, the architectural register file has been used to store only the destination operands of instructions in the pipeline that are valid. When the validity determination stage determined that an instruction in a pipeline was invalid, the valid operands stored by the architectural registers of the architectural register file (that correspond to the working registers of the working register file) were transferred to the working register file.
The working registers of the working register file then stored the transferred operands to replace the operands currently stored therein. This operation has placed the working register file in the same state that it was at the time just before the invalid instruction was beginning to be executed. As a result, the transferred operands may be subsequently selected as the source operands in the pipeline.
In general, execution units that use working register files and architectural register files provide a reduced number of operand bypasses. Unfortunately, as pipelines have become increasingly complex, it has become increasingly difficult to read the architectural register file in one clock cycle. Multi-issue pipelines exacerbate this problem by requiring larger and slower multi-ported register files.