A superscalar implementation of a processor architecture is one in which common instructions, e.g., integer and floating-point arithmetic, loads, stores and conditional branches, can be initiated out of order, and executed independently. However, such implementations raise a number of complex design issues related to instruction processing.
For purposes of clarity, certain terms used herein need to be defined. “Instruction-level parallelism” refers to the degree to which, on average, the instructions of a program can be executed in parallel or out of sequence. “In flight” means that an instruction is in the machine but not fully processed, e.g., the instruction is in a buffer waiting to be issued, or the instruction is being executed. “Dependency” refers to the way in which one instruction is effected by the processing or result of another instruction.
Superscalar processors are also known as Out-of Order processors. In a more traditional In-Order processor, e.g. a von Neumann type processor, the processor issues instructions in the exact order that would be achieved by sequential execution, and writes the results in the same order. In other words the processor will fetch an instruction in sequential order, decode it, issue it, execute it and write it back in sequential order. In contrast, Out of Order processors will use a combination of hardware and software techniques to execute multiple instructions in parallel by taking advantage of inherent instruction-level parallelism in a program. A typical superscalar processor will fetch a predetermined number of instructions (a bundle) into the machine simultaneously, and may have a large number of instructions “in-flight” simultaneously. Instructions may be fetched, executed, made to change registers and written back simultaneously or with little regard for their sequential order to increase processing speed. A constraint on the processor is that the result must be correct.
However, the degree of instruction level parallelism in a program is limited in large part on the number and types of data dependencies which may exist between instructions. Two types of data dependencies frequently encountered are known as: 1) a true data dependency or a Read-Write data hazard, and 2) an output dependency or a Write-Read data hazard. In a Read-Write data hazard (true data dependency) the current instruction needs data produced by a previous instruction. By way of example, consider the following sequence of instructions:                (I1) add r1, r2; load register r1 with the contents of r2 plus the contents of r1.        (I2) move r3, r1; load register r3 with the contents of r1.The second instruction (I2) can be fetched and decoded, but cannot execute until the first instruction (I1) executes. The reason is that one of the inputs (source register (r1)) of the second instruction (I2) is also the output (destination register (r1)) of the first instruction (I1). That is, the second instruction (I2) needs the data produced by the first instruction (I1), and must wait for I1 to write the data to r1 before I2 can read the data from r1.        
With no such dependency, two instructions can be fetched and executed in parallel. If there is a Read-Write data hazard between the first and second instructions, then the second instruction is delayed as many clock cycles as required to remove the dependency. In general, any current dependent instruction must be delayed until all of its input values (source registers) have been produced as output values (destination registers) by the previous instructions from which they depend. Therefore, superscalar processors must be able to flag these dependencies until they can be removed.
A Write-Read data hazard (output dependency), arises when the same register is used as the destination register in two separate instructions. In that case the second instruction can overwrite the data of the first instruction. This is basically a storage conflict, where the compiler is running out of architectural registers that are available to use. (The term “architectural register” is a software concept referring to a finite set of instruction symbols, e.g., R1 through R10, which represent registers in a program language, such as assembler, that are accessible to a programmer. In other words, those software registers within an instruction set that have a direct impact on the logical execution of a program.) Consider the following instruction set:                (I1) R3=R3−R5        (I2) R4=R3+1        (I3) R3=R5+1        (I4) R7=R3−R4In this case, I2 is dependent on I1, and I4 is dependent on I3. These are examples of true data dependencies (Read-Write data hazards). However, the relationship between 11 and 13 is different. There is no true data dependency as defined earlier, but if I3 executes to completion prior to I1, then the wrong value of the contents of R3 will be fetched for execution of I4. Consequently, I3 must complete after I1 to produce the correct output values.        
Write-Read data hazards (output dependencies) have grown over the years because many programs for early model processors, e.g., Intel's 286, were required to run on later model processors, e.g., Intel's 586. As the newer model machines became more powerful, the number of physical registers (actual hardware registers) grew substantially, but the number of architectural registers (software registers) did not.
However, because in most processors the number of physical registers far exceeds the number of architectural registers, the Write-Read data hazards can be solved through a process called “register renaming”. Register renaming is when physical registers are allocated dynamically by the processor hardware, and they are associated with the values of the architectural registers needed by instructions at various points in time. Consider the above sequence of instructions after register renaming:                (I1) R3(b)=R3(a)−R5(a)        (I2) R4(a)=R3(b)+1        (I3) R3(c)=R5(a)+1        (I4) R7(a)=R3(c)−R4(a)The register references without the letter in parenthesis refer to the architectural registers found in the instruction. The register references with the letter in parenthesis refer to the physical registers allocated to hold a new value. In this example, the creation of register R3(c) in instruction I3 avoids the Write-Read data hazard on the first instruction I1, because R3(a) and R3(c) are two separate physical registers. Register renaming is often accomplished through the use of a functional unit called an Instruction Renaming Unit (IRU).        
However, the renaming process creates additional problems in a superscalar processor. The superscalar processor not only must be able to flag dependencies between instructions, but must also be able to map all the physical registers to the architectural registers for each dependent instruction fetched. Additionally, the superscalar processor must be able to determine the dependencies between the instructions within a current fetch bundle (intra dependencies), as well as be able to determine the dependencies between the instructions in the fetched bundle and the in-flight instructions (inter dependencies). Moreover, the process of determination must be done quickly, within one cycle.
Accordingly, there is a need for an improved superscalar processor capable of quickly determining both the dependencies of an instruction, and the necessary addresses of the physical registers required by the instruction for execution.