1. Field of the Invention
This invention relates to microprocessors, and more specifically, to dependency checking in microprocessors.
2. Description of the Related Art
With recent advances in microprocessors, a number of new architectural features have been implemented for increased performance. One of these features is pipelining, wherein multiple instructions are overlapped in execution. Another architectural feature is to make a superscalar structure having multiple functional (or execution) units. This may allow a microprocessor to execute multiple instructions simultaneously. In some cases, these functional units may be designed to execute specific types of instructions. For example, one functional unit may be designed for integer operations, while another may be designed for floating point operations. Many current microprocessors employ both pipelined and superscalar features. A pipelined superscalar microprocessor running at peak efficiency may execute multiple instructions with each successive clock cycle.
Pipelining generally involves five steps: fetching an instruction, decoding the instruction, fetching the specified operands, executing the instruction, and writing back the results (to registers or memory). Each of these steps may require one or more clock cycles according to the design of a given pipeline. Thus, a pipeline may simultaneously write back the results of an executed first instruction, execute a second instruction, fetch operands for a third instruction, and so on, all within a given clock cycle. Under this ideal situation, a given execution unit may retire an instruction with each clock cycle. However, there are a number of different situations that may prevent the retiring of an instruction with each new clock cycle. In such situations, a delay of several clock cycles may occur between the execution of instructions. Similarly, superscalar microprocessors may experience different situations which delays concurrent execution of multiple instructions.
One situation that may cause a delay of execution in a pipelined and/or superscalar microprocessor involves dependencies between instructions. For example, if operands for a given instruction depend on the results of a previous instruction, the instruction may not be executed until the previous instruction has completed execution. FIG. 1 shows exemplary instructions which illustrate dependencies that may arise during program execution. Instruction #1, the first issued, instructs the processor to add the contents of register A to register B and store the result in register A. Each subsequent instruction follows this same general format. Since instruction #2 requires the processor to fetch both operands from register A, it may not execute until instruction #1 has executed and written its result back into register A. Similarly, instruction #4 cannot execute until the results of instruction #3 have been written back to register C and instruction #2 to register A. Satisfying these dependencies may add a delay between the execution of these instructions.
A technique for dealing with the problem described above is known as out-of-order execution. When this technique is employed, instructions may be executed in an order different from that in which they were issued, providing that program functionality is maintained. Referring again to FIG. 1, instruction #3 orders the processor to add the contents of register C to register D and store the result in register C. Neither instruction #1 nor instruction #2 fetch or store operands in registers C or D. Thus, instruction #3 has no dependencies with respect to instructions #1 and #2. Execution of instruction #3 prior to either instruction #2 or instruction #1 will yield a valid result without compromising program functionality.
A flag is a bit that indicates a condition resulting from the execution of an instruction, and is typically stored in a dedicated register. For example, in an x86 processor, the parity flag (PF) is used to indicate odd or even parity of the result, while the sign flag may be used to indicate the sign of a result of an arithmetic operation. In this architecture, a register referred to as the status register, or EFLAGS register, is used to store flags.
Some microprocessors, in order to perform out-of-order execution, implement a unit known as a reservation station. A typical reservation station may store instructions and their corresponding operands and flags prior to execution. If the operand or flag values are not valid for the corresponding instruction, a tag may be stored instead. The tag is used identify the instruction(s) upon which the dependency is based. Using comparison logic within the reservation station, result tags of executed instructions can be compared with the tag. When a matching tag is detected, the corresponding operand or flag may be captured. Once all stored operands/flag values stored have been provided for the corresponding instruction, the instruction may then be forwarded to a functional unit for execution. While storing larger numbers of instructions in a reservation station allows for more out-of-order execution, the amount of circuitry needed increases as well.
The problems outlined above are in large part solved by a system and method for shared dependency checking of status flags. In one embodiment, reading of the carry flag and overflow flag as operands is mutually exclusive by the instruction set of an x86 microprocessor, with one exception. Dependency checking hardware for the carry flag and overflow flag is shared. A 2-to-1 multiplexer is used to select the dependency checking result (either the flag or a tag identifying the instruction which updates the flag) depending on which flag is read by the corresponding instruction. The selected flag, or tag corresponding to the flag, may be stored in a reservation station. Comparison logic is present in the reservation station for each stored tag, and is used to check the status of the tag against the values required for instruction execution. Since comparison and storage logic for the carry and overflow flags/tags is shared, the amount of circuitry present in the reservation station may be reduced, thereby saving chip area. Although comparison logic for the flags is shared, the flags may still be independently written into the status register.
In general, the above technique may be applied to any microprocessor architecture in which flags are used. When the reading of a flag as an operand by an instruction set is exclusive, or nearly exclusive, with respect to the reading of another flag, circuitry supporting the reading of these flags may be shared. Execution may be altered for the few instructions that may need to read the flags that are otherwise read exclusively of each other. For example, in the x86 embodiment discussed above, the PUSHF (Push Flags) instruction requires independent reading of both the carry and overflow flags, which are otherwise read exclusively with respect to each other. The PUSHF instruction may be broken down into two instructions by firmware (e.g. microcode) within the microprocessor, thereby allowing the reading of both flags. In other embodiments, a decode unit may be configured to break down the instruction into two separate operations, thereby allowing both flags to be read for that instruction.
Thus, in various embodiments, the system and method for shared dependency checking of status flags may result in circuit area savings. Since the flags to be shared are not read simultaneously by the same instruction, logic supporting the reading of these flags may be shared for dependency checking. The flags may still be written independent of each other, and thus, the effect on microprocessor operations may be kept to a minimum.