1. Technical Field
The present invention relates in general to instruction target address collisions in a microprocessor and in particular to instruction target address collisions at architected registers in a processor. Still more particularly, the present invention relates to the reduction of logic associated with target address collision detection in a dual issue, dual completion processor.
2. Description of the Related Art
Reduced instruction set computer ("RISC") processors are employed in many data processing systems and are generally characterized by high throughput of instructions. RISC processors usually operate at a high clock frequency and, because of the minimal instruction set, do so very efficiently. In addition to high clock speed, processor efficiency is improved even more by the inclusion of multiple execution units allowing the execution of two, and sometimes more, instructions per clock cycle.
Processors with the ability to execute multiple instructions per clock cycle are described as "superscalar." Superscalar processors, such as the PowerPC.TM. family of processors available from IBM Corporation of Armonk, N.Y., provide simultaneous dispatch of multiple instructions. Included in the processor are an Instruction Cache ("IC"), a Dispatch Unit ("DU"), an Execution Unit ("EU") and a Completion Unit ("CU"). Generally, a RISC processor is "pipelined," meaning that a second instruction is waiting to enter the execution unit as soon as the previous instruction is finished.
The Dispatch Unit then issues the instructions to the Execution Units and the Completion Unit. The Dispatch Unit has the requirement of determining if there are registers ("address") available to receive the results of the Execution Units or the instructions are not dispatched. To avoid collisions for a given register (address) location, rename registers (temporary buffers) are provided to store, or stage, results prior to transfer to the architected (physical address) register. In the PowerPC 603e.TM. processor, for example, five rename registers are provided for the General Purpose Registers ("GPR"), four for the Floating Point Registers ("FPR") and one each for the condition register, the link register and the count register.
The dispatch unit dispatches an instruction to one of its Execution Units and allocates a rename register for the results of that instruction. If no rename register is available, no instruction will issue. At the same time, the instruction is dispatched to the completion unit for tracking and completion purposes. Instruction results are then transferred to the architected registers from the rename registers, by the completion unit, when an instruction is retired from the completion queue.
Collision checks are done, in the prior art, as the instruction completes. If there are two instructions, back to back in the CU queue, with the same targeted architected register, the most recent instruction in the completion queue is allowed to target the register while the previous instruction is discarded. If the first instruction results were written to the common target before completion of the second instruction, the first instruction's resultant data would be overwritten by the more recent instruction's result, upon completion. If the instructions complete at the same time and have the same architected register as the target address, the instructions must be compared and the registers associated with the rename registers must be compared.
However, completion is usually time critical. The logic required to determine if the two instructions accomplish the same task but the latest instruction has the most relevant data, takes a lot of time. The logic at the Completion Unit does not know to which architected register the instruction is writing. The rename registers are known, because the instruction is not dispatched unless a rename register is open. As it is, logic must check rename 1 (Instruction 1, temporary location) register and rename 2 (Instruction 2, temporary location) register to determine the physical register at which the instructions were targeted. The look up is complicated and is done in parallel to speed up the process. Every comparison is done at the same time, which demands a large amount of processor resources.
RISC processors usually use an elaborate target collision detection scheme, in the completion unit, that is exercised at completion time when results are complete. For example, assume that the processor is a dual issue, dual completion processor with six result rename registers (rename0, rename1, etc.). The target collision detection scheme would compare the target addresses associated with each rename register to the target addresses of every other register in every combination. The target address comparison routine is shown in Table 1 below:
compare target address of:
TABLE 1 ______________________________________ rename0 with rename1 rename0 with rename2 rename0 with rename3 rename0 with rename4 rename0 with rename5 rename1 with rename2 rename1 with rename3 rename1 with rename4 rename1 with rename5 rename2 with rename3 rename2 with rename4 rename2 with rename5 rename3 with rename4 rename3 with rename5 rename4 with rename5 ______________________________________
A set of mutually exclusive, signal result tags ("rtag") (in this case 6 bits in length), associated with each of the two completing instructions indicates the rename register associated with the instruction's resultant data. For example, the two completing instructions I0 and I1, where I1 follows I0 in program order, have the result tag signals I0.sub.-- rtag(0), I0.sub.-- rtag(1), I0.sub.-- rtag(2), etc. associated with I0. I1 has the result tag signals I1.sub.-- rtag(0), I1.sub.-- rtag(1), I1.sub.-- rtag(2), etc. associated with I1.
A target address collision is detected when the asserted tag-bit for I0 indicates a rename having the same register target address as the rename indicated by the asserted tag-bit of I1. The logic associated with this detection would resemble the following table of compares:
TABLE 2 ______________________________________ Collision = ______________________________________ (I0.sub.-- rtag(0) & I1.sub.-- rtag(1) & target0.sub.-- equals.sub.-- target1) .vertline. (I0.sub.-- rtag(1) & I1.sub.-- rtag(0) & target0.sub.-- equals.sub.-- target1) .vertline. (I0.sub.-- rtag(0) & I1.sub.-- rtag(2) & target0.sub.-- equals.sub.-- target2) .vertline. (I0.sub.-- rtag(2) & I1.sub.-- rtag(0) & target0.sub.-- equals.sub.-- target2) .vertline. (I0.sub.-- rtag(0) & I1.sub.-- rtag(3) & target0.sub.-- equals.sub.-- target3) .vertline. (I0.sub.-- rtag(3) & I1.sub.-- rtag(0) & target0.sub.-- equals.sub.-- target3) .vertline. (I0.sub.-- rtag(0) & I1.sub.-- rtag(4) & target0.sub.-- equals.sub.-- target4) .vertline. (I0.sub.-- rtag(4) & I1.sub.-- rtag(0) & target0.sub.-- equals.sub.-- target4) .vertline. (I0.sub.-- rtag(0) & I1.sub.-- rtag(5) & target0.sub.-- equals.sub.-- target5) .vertline. (I0.sub.-- rtag(5) & I1.sub.-- rtag(0) & target0.sub.-- equals.sub.-- target5) .vertline. (I0.sub.-- rtag(1) & I1.sub.-- rtag(2) & target1.sub.-- equals.sub.-- target2) .vertline. (I0.sub.-- rtag(2) & I1.sub.-- rtag(1) & target1.sub.-- equals.sub.-- target2) .vertline. (I0.sub.-- rtag(1) & I1.sub.-- rtag(3) & target1.sub.-- equals.sub.-- target3) .vertline. (I0.sub.-- rtag(3) & I1.sub.-- rtag(1) & target1.sub.-- equals.sub.-- target3) .vertline. (I0.sub.-- rtag(1) & I1.sub.-- rtag(4) & target1.sub.-- equals.sub.-- target4) .vertline. (I0.sub.-- rtag(4) & I1.sub.-- rtag(1) & target1.sub.-- equals.sub.-- target4) .vertline. (I0.sub.-- rtag(1) & I1.sub.-- rtag(5) & target1.sub.-- equals.sub.-- target5) .vertline. (I0.sub.-- rtag(5) & I1.sub.-- rtag(1) & target1.sub.-- equals.sub.-- target5) .vertline. (I0.sub.-- rtag(2) & I1.sub.-- rtag(3) & target2.sub.-- equals.sub.-- target3) .vertline. (I0.sub.-- rtag(3) & I1.sub.-- rtag(2) & target2.sub.-- equals.sub.-- target3) .vertline. (I0.sub.-- rtag(2) & I1.sub.-- rtag(4) & target2.sub.-- equals.sub.-- target4) .vertline. (I0.sub.-- rtag(4) & I1.sub.-- rtag(2) & target2.sub.-- equals.sub.-- target4) .vertline. (I0.sub.-- rtag(2) & I1.sub.-- rtag(5) & target2.sub.-- equals.sub.-- target5) .vertline. (I0.sub.-- rtag(5) & I1.sub.-- rtag(2) & target2.sub.-- equals.sub.-- target5) .vertline. (I0.sub.-- rtag(3) & I1.sub.-- rtag(4) & target3.sub.-- equals.sub.-- target4) .vertline. (I0.sub.-- rtag(4) & I1.sub.-- rtag(3) & target3.sub.-- equals.sub.-- target4) .vertline. (I0.sub.-- rtag(3) & I1.sub.-- rtag(5) & target3.sub.-- equals.sub.-- target5) .vertline. (I0.sub.-- rtag(5) & I1.sub.-- rtag(3) & target3.sub.-- equals.sub.-- target5) .vertline. (I0.sub.-- rtag(4) & I1.sub.-- rtag(5) & target4.sub.-- equals.sub.-- target5) .vertline. (I0.sub.-- rtag(5) & I1.sub.-- rtag(4) & target4.sub.-- equals.sub.-- target5) .vertline. ______________________________________
In the above logic equation, the symbol "&" indicates the logical-AND function, the symbol ".vertline." indicates the logical-OR function, and the symbol "=" indicates a signal assignment. It can be seen that the amount of logic associated with collision detection is large and can require a prohibitive amount of delay depending on which half-cycle the architected register file is written.
In FIG. 3, a simplified block diagram of a superscalar processor is depicted. Instructions are fetched from the instruction queue 302 and placed in the instruction queue of the dispatch unit 304. The instruction queue is a multi-entry queue and instructions enter the queue in the top position. The instructions step down in position as previous instructions are issued. Since the processor is capable of issuing two instructions per cycle, instructions are issued through position 1 and 0 of the instruction queue 306.
Instruction dispatch is done, by the dispatch unit 304, in program order. The instructions are dispatched, at the same time, to their respective execution units 308 and 310, and the completion unit 306. As in the dispatch unit 304, the instructions enter the Completion Unit 306 queue in the top position and step down as instructions are issued. The Completion Unit 306 provides a mechanism for tracking instructions from dispatch through execution.
The Execution Units 308 and 310, compute the instructions and send the results to previously assigned rename registers 312 and 314 in preparation for writing the results to the target address. Comparisons of the targeted addresses 316 and 318 that are associated with the rename registers 312 and 314 are made after the Execution Units 308 and 310 transfer the results to the rename registers 312 and 314.
FIG. 4 depicts the prior method of determining target address collisions. Considering FIG. 4 with FIG. 3, the process begins with step 400, which illustrates the Dispatch Unit receiving two instructions from the instruction cache. The process passes to step 402, which depicts the Dispatch Unit 304 determining if there are rename registers available for both instructions. If not, the process then proceeds to step 404 which illustrates the Dispatch Unit 304 holding the instructions until there are available rename registers. If there are rename registers available, the process proceeds instead to step 406 which depicts the Dispatch Unit 304 issuing both instructions to the respective Execution Units 308 and 310, and the Completion Unit 306.
The process then proceeds to step 408, which depicts the instructions being placed into the Completion Unit 306 instruction queue. The instruction queue is a First In First Out buffer and the instructions are always issued from the first and second positions (positions 0 and 1). The process then passes to step 410, which illustrates the Completion Unit checking to see if instruction I0 has completed. If not, the process repeats step 410 and I0 is held in the Completion Unit 306 until I0 is complete. If I0 is complete, the process proceeds instead to step 416, which illustrates the Completion Unit 306 checking to see if the respective Execution Unit 310 has completed I1. If not, the process then passes to step 413 which illustrates the completion unit 306 writing back I0 to the targeted address. The process proceeds to step 414 which depicts the Completion Unit 306 checking to see if I1 has completed. If I1 has completed, the process continues to step 415 which depicts the Completion Unit 306 writing I1 to its target address.
If the Completion Unit 306 determines that the Execution Unit 308 or 310 has completed I1, the process instead proceeds to step 418, which illustrates the Completion Unit 306 comparing the target addresses associated with the rename registers 312 and 314 in which instructions I1 and I0 are located, and determining if the architected registers 316 and 318 are the same. The process then passes to step 420 which depicts the Completion Unit 306 checking to see if I1 and I0 have the same address. If the addresses are not the same, the process then proceeds to step 421 which illustrates the Completion Unit writing I0 and I1 to the instructions' respective addresses. If the addresses are the same, the process proceeds instead to step 422 which depicts the Completion Unit writing back I1 to the target address and discarding I0. Based on the number of architected registers associated with the rename registers 312 and 314, there can be 15 comparisons made, just to determine if the results from I0 and I1 are to be written to the same architected register.
The logic, as discussed previously, requires the comparison of all the possible target addresses available to the rename registers, in order to avoid any target address collisions. Results and operation of the Completion Unit 306 are held up until the comparisons are made. Only then are the results written to the targeted address. The current process of comparing target architected addresses, at the end of the execution of instructions, consumes a lot of processor resources. In order to reduce the time calculating the comparisons, the logic is handled in parallel, thus increasing the need for physical resources, i.e., comparators, to handle the calculations.
It would be desirable, therefore, to provide a method and apparatus for detecting target address collisions, in a RISC processor, that would reduce the time from issuance to completion and reduce the need for extensive physical resources.