Instruction pipelining generally involves splitting a data processor into a series of stages called a pipeline. Typically, the pipeline stages process different portions of a stream of instructions concurrently. For example, a fetch stage may fetch instructions from main memory while an execution stage executes one or more previously fetched instructions.
In general, pipelined processors are susceptible to delays caused by instruction dependencies within the instruction stream. For example, consider the following instruction stream having instructions (1), (2) and (3), where (OP1), (OP2) and (OP3) are operations (e.g., add, shift, logical OR) that require various amounts of time (processor cycles) to complete.
(1) R2=R1 (OP1) R5
(2) R1=R3 (OP2) R8
(3) R7=R4 (OP3) R6
An instruction dependency exists between instructions (1) and (2) because instruction (1) reads data from register R1, and instruction (2) subsequently writes new data to register R1. In order for instruction (1) to provided a correct result, instruction (2) must write the new data to register R1 after instruction (1) reads the original data from register R1. If instruction (2) writes to register R1 before instruction (1) reads from register R1, instruction (1) will read the new data written by instruction (2) rather than the original data, and thus may provide an incorrect result. Accordingly, a write-after-read (WAR) dependency (or data hazard) exists between instructions (1) and (2).
Instruction (3) does not access any registers that are accessed by instructions (1) or (2). Accordingly, no instruction dependency exists between instruction (3) and instructions (1) and (2).
In addition to WAR dependencies, there are other types of instruction dependencies that can occur within an instruction stream. In particular, write-after-write (WAW) dependencies involve two instructions that write to the same register in an instruction stream. The two instructions must write to the register in proper order. Otherwise, the wrong data will be left in that register after the two instructions complete. If the wrong data is left in that register, another instruction that reads from that register may provide an incorrect result.
Another type of dependency is a read-after-write (RAW) dependency which involves a first instruction that writes to a register, and a subsequent instruction that reads from the same register. The first instruction must write to the register before the subsequent instruction reads from that register. Otherwise, the subsequent instruction will not read the result of the first instruction, and instead read old data.
Some pipelined processors resolve instruction dependencies by delaying instructions in the pipeline. For the above example, such a processor may issue instruction (1), and delay issuing instruction (2) until instruction (1) reads from register R1. The delay prevents instruction (2) from inadvertently overwriting the contents of register R1 before instruction (1) reads from register R1. Accordingly, the data hazard between instructions (1) and (2) is resolved.
Some processors which delay instructions to resolve instruction dependencies have the ability to issue instructions out-of-order. Such out-of-order processors may issue other instructions in place of the delayed instructions so that the processor remains busy. For the above example, an out-of-order processor may delay issuance of instruction (2) while instruction (1) executes. Furthermore, the processor may issue instruction (3) in place of instruction (2) such that stages of the processor do not become idle. Since no dependency exists for instruction (3), it does not matter when instruction (3) executes relative to instructions (1) and (2). Once instruction (1) has read from register R1, the processor may issue instruction (2) even though instruction (3) has already issued.
The conventional approach of resolving instruction dependencies by delaying particular instructions and issuing other instructions in their place is not very effective in certain situations. For example, when the instruction stream has many instruction dependencies and few instructions without dependencies, many instructions must be delayed, and few instructions can be issued in place of the delayed instructions. For such an instruction stream (or portions thereof), the conventional approach may not be able to keep the pipelined processor busy.
The present invention is a technique for mapping instructions to resolve certain types of instruction dependencies such as write-after-read (WAR) dependencies and write-after-write (WAW) dependencies. In some situations, the instructions, once mapped, no longer access the same registers. Accordingly, the particular dependencies are resolved without delaying instructions.
One embodiment of the technique involves obtaining an instruction having at least one logical operand that identifies a logical register. The technique further involves renaming the logical operand with a physical operand that identifies a physical register according to a set of assignments that assign logical registers to physical registers. The instruction is mapped when each logical operand has been renamed. Accordingly, there is no need to delay instructions, and pipeline throughput can be maintained.
Mapped instructions may include logical source and destination operands that identify particular logical registers. Renaming a logical source operand preferably involves finding, in the set of assignments, an existing assignment according to the logical source operand. The found existing assignment may assign the particular logical register to a particular physical register. Renaming may further involve replacing, in the obtained instruction, the logical source operand with a physical source operand that identifies the particular physical register according to the found existing assignment.
The set of assignments may include valid assignments and invalid assignments. Furthermore, finding the existing assignment may involve locating, in the set of assignments, a valid assignment and at least one invalid assignment according to the logical source operand. Finding may further involve selecting, as the existing assignment, the located valid assignment from the located valid and invalid assignments.
Renaming the logical destination operand may involve generating a new assignment according to the set of assignments. The generated new assignment may assign the particular logical register to a particular physical register. Renaming may further involve replacing the logical destination operand with a physical destination operand that identifies the particular physical register according to the generated new assignment.
A previously generated assignment may assign the particular logical register to a physical register that is different than the particular physical register. In this situation, generating the new assignment may involve invalidating the previously generated assignment. Generating may further involve creating and validating the generated new assignment that assigns the particular logical register to the particular physical register.
Another embodiment of the invention is directed to a technique for managing register assignments. The technique involves maintaining, in a register list memory circuit having entries that respectively correspond to physical registers, a list of register assignments that assign logical registers to the physical registers. Additionally, the technique involves maintaining, in a vector memory circuit having bits that respectively correspond to the physical registers, a valid vector that forms, in combination with the list of register assignments, a list of valid register assignments. Furthermore, the technique involves storing, for an instruction that is mapped by the data processor, a copy of the valid vector from the vector memory circuit to a silo memory circuit. Preferably, the processor using the technique has the ability to execute branches of instructions speculatively, and to recover if it is determined that the processor executed down an incorrect instruction branch.
As will now be explained, storage of the valid vector in memory enables the state of the processor to easily and quickly recover. The technique preferably involves transferring the stored copy of the valid vector from the silo memory circuit to the vector memory circuit in response to a signal indicating that an incorrect instruction branch has executed to restore the list of valid register assignments to the data processor. In this situation, the previous register assignments are restored when the valid vector is retrieved from memory and transferred back to the vector memory circuit.
The technique may further involve canceling the copy of the valid vector stored in the silo memory circuit in response to a signal indicating that the instruction is retired.
When the instruction includes a logical source operand that identifies a particular logical register, the technique may involve finding, in the register list memory circuit, a first entry that assigns the particular logical register to a first physical register, and a second entry that assigns the particular logical register to a second physical register that is different than the first physical register. The technique may further involve selecting one of the first and second entries as a valid entry according to the valid vector maintained in the vector memory circuit, the selected valid entry being used by the data processor to map the instruction.
Finding the first and second entries may involve comparing contents of each of the entries in the register list memory circuit with a signal that identifies the particular logical register to find the first and second entries.
When the instruction includes a logical destination operand that identifies a particular logical register, the silo memory circuit may store a plurality of valid vectors that correspond to a plurality of previously mapped instructions. In this situation, the technique involves performing a logical OR operation based on the plurality of valid vectors to identify, in the register list memory circuit, an unused entry that corresponds to a particular physical register. The technique further involves setting contents of the unused entry according to the logical destination operand to assign the particular logical register to the particular physical register. The result is that the physical register that is assigned to store the result of the mapped instruction is an unused physical register. Accordingly, WAR and WAW dependencies are resolved.
Preferably, the technique further involves clearing, in the valid vector stored in the vector memory circuit, a first bit that corresponds to the particular physical register to invalidate a previously valid register assignment. The technique may further involve setting, in the valid vector stored in the vector memory circuit, a second bit that is different than the first bit to form a new valid register assignment. This feature of the invention enables the processor to store past register assignments by maintaining entries in the register list memory circuit and transferring valid vectors from the vector memory circuit to the silo memory circuit. The memory space required to store the valid vectors is small such that register assignments for many processor cycles can be saved.
The vector memory circuit preferably includes additional valid vectors that correspond to additional instructions that are mapped by the data processor. In this situation, the technique further involves storing, for the additional instructions, copies of the additional valid vectors from the vector memory circuit to the silo memory circuit simultaneously. This feature of the invention enables the invention to be used in superscaler machines.