The embodiments described below involve the developing and ever-expanding field of computer systems and microprocessors. Microprocessors operating in a pure sequential order are now being surpassed by so-called "superscalar" microprocessors which can perform more than one instruction execution at a time. Naturally, the ability to execute more than one instruction at a time provides vast increases in processor speed and, therefore, is highly desirable. Typically, however, the superscalar processor must be able to run software written for scalar processors, that is, software which was created with the expectation that each instruction would occur in sequence, rather than anticipating the possibility of parallel operations. As a result, superscalar microprocessor designers are faced with endless complexities where executing two or more successive instructions at once would create some type of conflict. Certain types of conflicts arising from superscalar design are often referred to in the art as "dependencies". In the prior art, certain dependencies arise when two instructions, if executed simultaneously, would adversely affect one another. Various types of such dependencies exist, such as "true" data dependencies and data anti-dependencies, both of which are described using examples below. The examples below also demonstrate the convention for using pseudo code throughout this document.
A true data dependency occurs between successive instructions when the later-occurring instruction requires as an operand the data resulting from execution of the earlier-occurring instruction. For example, consider the following pseudo code instructions of Table 1:
TABLE 1 ______________________________________ Instruction Number Pseudo Code Action Taken ______________________________________ (1) MOV, AX, BX AX .rarw. BX (2) ADD CX, AX CX .rarw. CX + AX ______________________________________
where,
"Instruction Number" is the sequence in which the instructions appear in a sequential program;
"Pseudo code" is the pseudo code applying typical operations to values stored in any one of three registers, denoted AX, BX, or CX; and
"Action Taken" is a representation of the action taken (if any) on the value(s) in the logical register(s) and showing the destination of the result by a left-pointing arrow.
To demonstrate the above convention, when instruction (1) executes, the contents of register BX are stored into register AX. Further, when instruction (2) executes, the contents of register CX are added to the contents of register AX and the result is stored into register CX.
Returning now to the explanation of data dependencies, note that instruction (2) requires AX as one of its operands, but this same operand is the result of executing instruction (1); thus, instruction (2) is said to be data dependent on instruction (1). Given this dependency, and without further action, instruction (2) cannot execute until instruction (1) has executed and stored its result into register AX. Accordingly, without a further technique, instructions (1) and (2) cannot execute in parallel and, therefore, are not amenable to operating in a superscalar sense.
An anti-dependency occurs between successive instructions when the later-occurring instruction, if executed at the same time as the earlier-occurring instruction, would overwrite an operand in the logical register of the earlier-occurring instruction. For example, consider the following pseudo code instructions of Table 2:
TABLE 2 ______________________________________ Instruction Number Pseudo Code Action Taken ______________________________________ (1) MOV AX, BX AX .rarw. BX (2) MOV BX, CX BX .rarw. CX ______________________________________
In Table 2, note that instruction (2), if executed at the same time as instruction (1), could overwrite the value in register BX and, therefore, cause an unintended (and likely erroneous) result in the execution of instruction (1). Due to this effect, the relationship between the two instructions is sometimes referred to as a write-after-read (i.e., the second-occurring instruction writes the same register location which is read by the first-occurring instruction). Again, therefore, without a further technique, instructions (1) and (2) cannot execute in parallel and, therefore, are not amenable to operating in a superscalar sense.
The above examples are two types of register dependencies, but are not intended to be exhaustive. Indeed, one skilled in the art will recognize other types of dependencies which either overlap or are independent of those described above. In any event, one thing each of these register dependencies has in common is that the limitations imposed by the dependency, without further action, prevent concurrent execution of the interdependent instructions. However, during years of research and development, various techniques have evolved to eliminate or reduce the effects of these register dependencies so that parallel operations can take place. Some solutions are generated in software, but are often criticized as expecting too much from the programmer's point of view. Better considered solutions are those established in hardware and which, therefore, are transparent to the programmer.
To better understand another factor giving rise to dependencies, consider the popular Intel X86 architecture which includes eight general purpose architectural registers. As known in the art, all of the processor's register operations must occur using these eight registers. Consequently, only a relative few number of registers are available for many different operations. This number of registers may have been acceptable for sequential operation, but with the advancement of superscalar development based on the X86 instruction set, the contention for use of these registers and, hence, the amount of dependencies, is an increasingly common experience.
One solution to avoid some types of dependencies (e.g., anti-dependencies) is known as register renaming and is described in various literature. For example, register renaming is described by Mike Johnson in the book entitled Superscalar Microprocessor Design, (PTR Prentice Hall, Inc. 1991), which is hereby incorporated herein by reference. Register renaming is achieved by including an independent set of physical registers internal to the processor. These physical registers (i.e., the rename registers) outnumber, and store the data intended for, the logical (or architectural) registers such as those eight described above in the X86 architecture. To further accomplish this process, a table keeps track of various information which ultimately directs the result of the instruction execution into one of the rename registers; in this manner, therefore, the architectural register is "renamed" to one of the rename registers. Accordingly, where two instructions in a scalar processor might impose a dependency on the same logical register, now the operand or result is stored in two independent rename registers. Consequently, the dependency is removed and those two instructions can execute concurrently, rather than sequentially. It also should be noted that register renaming by itself will not eliminate a true data dependency. However, the technique may be combined with other techniques (e.g., so-called data forwarding) to improve performance even given the true data dependency. Thus, the register renaming function is often applied to true data dependencies as well.
Although the above addresses limitations created by superscalar operations where few logical registers are available, the inventor of the present embodiments has recognized that dependencies on memory locations, as opposed to logical registers, is an increasing problem. The inventor further forecasts that the problem will continue to increase due to many factors, including those arising in the future. For example, current superscalar processors often execute a few instructions at a time. However, the present inventor has recognized that future superscalar processors will execute many more such concurrent instructions. As a result, more resources could be concurrently accessed, and this could include the same memory location as opposed to the same logical register. As another example, the present inventor has recognized that many programs, both in the past and present, are written to access the same general area within memory. This practice, when combined with concurrent operation execution, increases the possibility that two or more instructions will create a dependency based on the same memory location(s). As yet another example, many computer programs such as X86-based programs tend to access so-called memory stacks, which also by definition appear in the same locations in a given memory. Thus, the present inventor has recognized that access to these stack locations, when combined with concurrent operation execution, will cause dependencies based on the stack location(s).
In view of the above, there arises a need to address the drawbacks of current processors, particularly in view of the constant increases in demand for processor efficiency and performance.